TL;DR I need to find a real solution to download my data from product datastore and load it to the local development environment.
The detailed problem:
I need to test my app in local development server with the real data (not real-time data) on datastore of the product server. The documentation and other resources offer three option:
- Using appfg.py downloading data from the product server then loading it into the local development environment. When I use this method I am getting 'bad request' error due to Oauth problem. Besides, this method will be deprecated. The official documentation advises using the second method:
- Using the gcloud via managed export and the import. The epic documentation of this method explains how we backup all data on console (in https://console.cloud.google.com/). I have tried this method. The backup data is being generated on storage in the cloud. I downloaded it. It is in the LevelDB format. I need to load it into local development server. There is no official explanation for it. The loading method of the first method is not compatible with LevelDB format. I couldn't find an official way to solve the problem. There is a StackOverflow entry but it is not worked for me because of it just gets all entities as the dict. The conversation the 'dic' object to the 'ndb' Entities becomes the tricky problem.
- I have lost my hope with the first two methods then I have decided the use Cloud Datastore Emulator (beta) which provides the emulating real data on local development environment. It is still beta and has several problems. When I run the command I encountered the problem DATASTORE_EMULATOR_HOST anyway.
It sounds like you should be using a remote sandbox
Even if you get this to work, the localhost datastore still behaves differently than the actual datastore.
If you want to truly simulate your production environment, then i would recommend setting up a clone of your app engine project as a remote sandbox. You could deploy your app to a new gae project id
appcfg.py update . -A sandbox-id
, and use datastore admin to create a backup of production in google cloud storage and then use datastore admin in your sandbox to restore this backup in your sandbox.Cloning production data into localhost
I do prime my localhost datastore with some production data, but this is not a complete clone. Just the core required objects and a few test users.
To do this I wrote a google dataflow job that exports select models and saves them in google cloud storage in jsonl format. Then on my local host I have an endpoint called
/init/
which launches a taskqueue job to download these exports and import them.To do this i reuse my JSON REST handler code which is able to convert any model to json and vice versa.
In theory you could do this for your entire datastore.
EDIT - This is what my to-json/from-json code looks like:
All of my
ndb.Model
s subclass myBaseModel
which has generic conversion code:My request handlers then use
set_from_dto
&to_dto
like this (BaseHandler
also provides some convenience methods for converting json payloads to python dicts and what not):