How do you upload data in bulk to Google App Engin

2020-02-26 01:56发布

I have about 4000 records that I need to upload to Datastore.

They are currently in CSV format. I'd appreciate if someone would point me to or explain how to upload data in bulk to GAE.

4条回答
看我几分像从前
2楼-- · 2020-02-26 02:07

By using remote API and operations on multiple entities. I will show an example on NDB using python, where our Test.csv contains the following values separated with semicolon:

1;2;3;4
5;6;7;8

First we need to import modules:

import csv
from TestData import TestData
from google.appengine.ext import ndb
from google.appengine.ext.remote_api import remote_api_stub

Then we need to create remote api stub:

remote_api_stub.ConfigureRemoteApi(None, '/_ah/remote_api', auth_func, 'your-app-id.appspot.com')

For more information on using remote api have a look at this answer.

Then comes the main code, which basically does the following things:

  1. Opens the Test.csv file.
  2. Sets the delimiter. We are using semicolon.
  3. Then you have two different options to create a list of entities:
    1. Using map reduce functions.
    2. Using list comprehension.
  4. In the end you batch put the whole list of entities.

Main code:

# Open csv file for reading.
with open('Test.csv', 'rb') as file:
    # Set delimiter.
    reader = csv.reader(file, delimiter=';')

    # Reduce 2D list into 1D list and then map every element into entity.
    test_data_list = map(lambda number: TestData(number=int(number)),
            reduce(lambda list, row: list+row, reader)
        )

    # Or you can use list comprehension.
    test_data_list = [TestData(number=int(number)) for row in reader for number in row]

    # Batch put whole list into HRD.
    ndb.put_multi(test_data_list)

The put_multi operation also takes care of making sure to batch appropriate number of entities in a single HTTP POST request.

Have a look at this documentation for more information:

查看更多
迷人小祖宗
3楼-- · 2020-02-26 02:11

You can use the bulkloader.py tool:

The bulkloader.py tool included with the Python SDK can upload data to your application's datastore. With just a little bit of set-up, you can create new datastore entities from CSV files.

查看更多
兄弟一词,经得起流年.
4楼-- · 2020-02-26 02:17

I don't have the perfect solution, but I suggest you have a go with the App Engine Console. App Engine Console is a free plugin that lets you run an interactive Python interpreter in your production environment. It's helpful for one-off data manipulation (such as initial data imports) for several reasons:

  1. It's the good old read-eval-print interpreter. You can do things one at a time instead of having to write the perfect import code all at once and running it in batch.
  2. You have interactive access to your own data model, so you can read/update/delete objects from the data store.
  3. You have interactive access to the URL Fetch API, so you can pull data down piece by piece.

I suggest something like the following:

  1. Get your data model working in your development environment
  2. Split your CSV records into chunks of under 1,000. Publish them somewhere like Amazon S3 or any other URL.
  3. Install App Engine Console in your project and push it up to production
  4. Log in to the console. (Only admins can use the console so you should be safe. You can even configure it to return HTTP 404 to "cloak" from unauthorized users.)
  5. For each chunk of your CSV:
    1. Use URLFetch to pull down a chunk of data
    2. Use the built-in csv module to chop up your data until you have a list of useful data structures (most likely a list of lists or something like that)
    3. Write a for loop, iterating through each each data structure in the list:
      1. Create a data object with all correct properties
      2. put() it into the data store

You should find that after one iteration through #5, then you can either copy and paste, or else write simple functions to speed up your import task. Also, with fetching and processing your data in steps 5.1 and 5.2, you can take your time until you are sure that you have it perfect.

(Note, App Engine Console currently works best with Firefox.)

查看更多
祖国的老花朵
5楼-- · 2020-02-26 02:31

the later version of app engine sdk, one can upload using the appcfg.py

see appcfg.py

查看更多
登录 后发表回答