Google Cloud Datalab error writing to Cloud Storag

2019-06-03 20:39发布

问题:

I am using Google Cloud Datalab for the first time to build a classifier for a Kaggle competition. But I am stuck trying to write a csv file containing the pre-processed training data to Cloud Storage using the google.datalab.storage API.

The file contains strings with unicode characters which causes the write_stream to a Storage object to trigger the error: Failed to process HTTP response.

Here is the simplified code only trying to write a single string:

from google.datalab import Context
import google.datalab.storage as storage

project = Context.default().project_id
bucket_name = project
bucket_object = storage.Bucket(bucket_name)
file_object = bucket_object.object('x.txt')

test_string = 'Congratulations from me as well, use the tools well. \xc2\xa0\xc2\xb7 talk'
#test_string = 'Congratulations from me as well, use the tools well.  talk'

print type(test_string)
print len(test_string)
test_string = test_string.decode('utf-8')
print type(test_string)
print len(test_string)
test_string = test_string.encode('utf-8')
print type(test_string)
print len(test_string)

try:
  file_object.write_stream(test_string, 'text/plain')
except Exception as e:
  print e

Output:

<type 'str'>
62
<type 'unicode'>
60
<type 'str'>
62
Failed to process HTTP response.

If I use the string without the Unicode characters the Storage object is created and the string is written to the file. It makes no difference whether I am trying to write the Unicode decoded version or the string encoded one. The content type ('text/plain' or 'application/octet-stream') also makes no difference.

I would appreciate any help or idea how to solve this, especially since the google.datalab.storage API is barely documented (like most things GCP).

Thx.