google-cloud-python: Language: 500 InternalServerError: Exception deserializing message!

2.Win10 3.Python 2.7.13 4.Python 2.7.13 5.n/a 6. I am trying to run a Python script from Google Cloud Natural Language API Python Samples https://github.com/GoogleCloudPlatform/python-docs-samples/tree/master/language/cloud-client/v1/snippets.py https://github.com/GoogleCloudPlatform/python-docs-samples/tree/master/language/cloud-client/v1beta2/snippets.py

I have not made any modifications.
Specifically, I want to run entities analysis on a text file/document. and the relevant part of the code is below.

def entities_file(gcs_uri):
    """Detects entities in the file located in Google Cloud Storage."""
    client = language_v1beta2.LanguageServiceClient()

    # Instantiates a plain text document.
    document = types.Document(
        gcs_content_uri=gcs_uri,
        type=enums.Document.Type.PLAIN_TEXT)

    # Detects sentiment in the document. You can also analyze HTML with:
    #   document.type == enums.Document.Type.HTML
    entities = client.analyze_entities(document).entities

    # entity types from enums.Entity.Type
    entity_type = ('UNKNOWN', 'PERSON', 'LOCATION', 'ORGANIZATION',
                   'EVENT', 'WORK_OF_ART', 'CONSUMER_GOOD', 'OTHER')

    for entity in entities:
        print('=' * 20)
        print(u'{:<16}: {}'.format('name', entity.name))
        print(u'{:<16}: {}'.format('type', entity_type[entity.type]))
        print(u'{:<16}: {}'.format('metadata', entity.metadata))
        print(u'{:<16}: {}'.format('salience', entity.salience))
        print(u'{:<16}: {}'.format('wikipedia_url',
              entity.metadata.get('wikipedia_url', '-')))

I have put my text file (utf-8 encoding) on cloud storage at gs://neotokyo-cloud-bucket/TXT/TTS-01.txt

I am running the script in Google cloud shell. and when I run the file: python snippets.py entities-file gs://neotokyo-cloud-bucket/TXT/TTS-01.txt

I get the following error, which appears to be protobuf related.

[libprotobuf ERROR google/protobuf/wire_format_lite.cc:629]. 
String field 'google.cloud.language.v1beta2.TextSpan.content' 
contains invalid UTF-8 data when parsing a protocol buffer. 
Use the 'bytes' type if you intend to send raw bytes.

 ERROR:root:Exception deserializing message!
 Traceback (most recent call last):
 File "/usr/local/lib/python2.7/dist-packages/grpc/_common.py", line 87, in _transform
return transformer(message)
 DecodeError: Error parsing message
 Traceback (most recent call last):
 File "snippets.py", line 336, in <module>
entities_file(args.gcs_uri)
 File "snippets.py", line 114, in entities_file
entities = client.analyze_entities(document).entities
 File "/usr/local/lib/python2.7/dist-     packages/google/cloud/language_v1beta2/gapic/language_service_client.py", line 226, in analyze_entities
return self._analyze_entities(request, retry=retry, timeout=timeout)
 File "/usr/local/lib/python2.7/dist-packages/google/api_core/gapic_v1/method.py", line 139, in __call__
return wrapped_func(*args, **kwargs)
 File "/usr/local/lib/python2.7/dist-packages/google/api_core/retry.py", line 260, in retry_wrapped_func
on_error=on_error,
 File "/usr/local/lib/python2.7/dist-packages/google/api_core/retry.py", line 177, in retry_target
return target()
 File "/usr/local/lib/python2.7/dist-packages/google/api_core/timeout.py", line 206, in func_with_timeout
return func(*args, **kwargs)
 File "/usr/local/lib/python2.7/dist-packages/google/api_core/grpc_helpers.py", line 56, in error_remapped_callable
six.raise_from(exceptions.from_grpc_error(exc), exc)
 File "/usr/local/lib/python2.7/dist-packages/six.py", line 737, in raise_from
raise value
 google.api_core.exceptions.InternalServerError: 500 Exception deserializing response!

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 18 (9 by maintainers)

Most upvoted comments

just in case someone got this same error, I was getting it with HTML text and the client.analyze_entities API.

text = '\r\n\t\r\n\t<p>La presidenta Cristina Kirchner cerrĂ³ es .... <br><br></p>\r\n\r\n'

Just do a strip() to the text and it is now working.

text_stripped = '<p>La presidenta Cristina Kirchner cerrĂ³ es .... <br><br></p>'

Hope it helps