Is text the only content type for %%storage magic

2019-09-09 21:25发布

I am working with he new cloud datalab and trying to save binary files to a GCS bucket using the magic function %%storage. For instance, I like to save PANDAS data frames (used to do it as pickle files) and mostly SciKitLearn model objects (after training).

I have tried a few things with %%storage and no luck. It seems to me that %%storage is intended to work with text data only. Is this a correct? I have no issues with CSV files for instance. The parameters that I can provide for %%storage write are only the bucket object and the variable to be saved.

I know that the notebooks included in datalab are intended to be the datalab documentation. But with all fairness, that documentation is extremely poor, to put it in polite terms. Also, the embedded documentation on the code (pressing shift+tab) when cursor is on the code within a cell, it is also very very poor.

If any of you had any other source of documentation for this, please let me know. I tried perusing the git hub code but could not get to it.

In my view, adding magic function functionality is supposed to make things easier for ad-hoc data analysis and such using notebooks. Though, by having this poor implementation and poor documentation, it defeats the purpose and making it more cumbersome. Actually, if you are new on this, a good advice will be to go directly and learn the gcloud API for python and do not bother with datalab magic functions and datalab API given the level of maturity.

1条回答
ら.Afraid
2楼-- · 2019-09-09 22:06

This worked for me to save a scikit-Learn model on GCS. Not sure if this is the most efficient way but I have tested and works. The model object is Pickled using the call from s

  1. Set up your bucket. classifier is the scikit-learn object.

    from gcloud import storage as gcs
    bucket = gcs.Client().get_bucket('name')
    file_msg = 'path_to_file/filename.pkl'
    
  2. Save to bucket

    # serialize contents 
    contents = BytesIO() pkl.dump(classifier, contents)
    
    # upload 
    file_blob = bucket.blob(file_msg)
    file_blob.upload_from_string(contents.getvalue(),
    content_type='sklearn/pickle')
    
    # Tested downloading file to local machine and `pickle.load(open(filename), 'rb')` and it loads fine.
    
  3. Load as an object

    # set up bucket (already done)
    
    # set up blob for file
    file_blob = bucket.blob(file_msg)
    
    # get serialized contents
    contents = BytesIO(file_blob.download_as_string())
    
    # read into msgpack under pandas
    model = pkl.load(contents)
    
查看更多
登录 后发表回答