Transferring data from product datastore to local

2019-05-21 21:40发布

TL;DR I need to find a real solution to download my data from product datastore and load it to the local development environment.

The detailed problem:

I need to test my app in local development server with the real data (not real-time data) on datastore of the product server. The documentation and other resources offer three option:

  1. Using appfg.py downloading data from the product server then loading it into the local development environment. When I use this method I am getting 'bad request' error due to Oauth problem. Besides, this method will be deprecated. The official documentation advises using the second method:
  2. Using the gcloud via managed export and the import. The epic documentation of this method explains how we backup all data on console (in https://console.cloud.google.com/). I have tried this method. The backup data is being generated on storage in the cloud. I downloaded it. It is in the LevelDB format. I need to load it into local development server. There is no official explanation for it. The loading method of the first method is not compatible with LevelDB format. I couldn't find an official way to solve the problem. There is a StackOverflow entry but it is not worked for me because of it just gets all entities as the dict. The conversation the 'dic' object to the 'ndb' Entities becomes the tricky problem.
  3. I have lost my hope with the first two methods then I have decided the use Cloud Datastore Emulator (beta) which provides the emulating real data on local development environment. It is still beta and has several problems. When I run the command I encountered the problem DATASTORE_EMULATOR_HOST anyway.

1条回答
小情绪 Triste *
2楼-- · 2019-05-21 22:15

It sounds like you should be using a remote sandbox

Even if you get this to work, the localhost datastore still behaves differently than the actual datastore.

If you want to truly simulate your production environment, then i would recommend setting up a clone of your app engine project as a remote sandbox. You could deploy your app to a new gae project id appcfg.py update . -A sandbox-id, and use datastore admin to create a backup of production in google cloud storage and then use datastore admin in your sandbox to restore this backup in your sandbox.

Cloning production data into localhost

I do prime my localhost datastore with some production data, but this is not a complete clone. Just the core required objects and a few test users.

To do this I wrote a google dataflow job that exports select models and saves them in google cloud storage in jsonl format. Then on my local host I have an endpoint called /init/ which launches a taskqueue job to download these exports and import them.

To do this i reuse my JSON REST handler code which is able to convert any model to json and vice versa.

In theory you could do this for your entire datastore.

EDIT - This is what my to-json/from-json code looks like:

All of my ndb.Models subclass my BaseModel which has generic conversion code:

get_dto_typemap = {
    ndb.DateTimeProperty: dt_to_timestamp,
    ndb.KeyProperty: key_to_dto,
    ndb.StringProperty: str_to_dto,
    ndb.EnumProperty: str,
}
set_from_dto_typemap = {
    ndb.DateTimeProperty: timestamp_to_dt,
    ndb.KeyProperty: dto_to_key,
    ndb.FloatProperty: float_from_dto,
    ndb.StringProperty: strip,
    ndb.BlobProperty: str,
    ndb.IntegerProperty: int,
}

class BaseModel(ndb.Model):

    def to_dto(self):
        dto = {'key': key_to_dto(self.key)}
        for name, obj in self._properties.iteritems():
            key = obj._name
            value = getattr(self, obj._name)
            if obj.__class__ in get_dto_typemap:
                if obj._repeated:
                    value = [get_dto_typemap[obj.__class__](v) for v in value]
                else:
                    value = get_dto_typemap[obj.__class__](value)
            dto[key] = value
        return dto

    def set_from_dto(self, dto):
        for name, obj in self._properties.iteritems():
            if isinstance(obj, ndb.ComputedProperty):
                continue
            key = obj._name
            if key in dto:
                value = dto[key]
                if not obj._repeated and obj.__class__ in set_from_dto_typemap:
                    try:
                        value = set_from_dto_typemap[obj.__class__](value)
                    except Exception as e:
                        raise Exception("Error setting "+self.__class__.__name__+"."+str(key)+" to '"+str(value) + "': " + e.message)
                try:
                    setattr(self, obj._name, value)
                except Exception as e:
                    print dir(obj)
                    raise Exception("Error setting "+self.__class__.__name__+"."+str(key)+" to '"+str(value)+"': "+e.message)

class User(BaseModel):
    # user fields, etc

My request handlers then use set_from_dto & to_dto like this (BaseHandler also provides some convenience methods for converting json payloads to python dicts and what not):

class RestHandler(BaseHandler):
    MODEL = None

    def put(self, resource_id=None):
        if resource_id:
            obj = ndb.Key(self.MODEL, urlsafe=resource_id).get()
            if obj:
                obj.set_from_dto(self.json_body)
                obj.put()
                return obj.to_dto()
            else:
                self.abort(422, "Unknown id")
        else:
            self.abort(405)

    def post(self, resource_id=None):
        if resource_id:
            self.abort(405)
        else:
            obj = self.MODEL()
            obj.set_from_dto(self.json_body)
            obj.put()
            return obj.to_dto()

    def get(self, resource_id=None):
        if resource_id:
            obj = ndb.Key(self.MODEL, urlsafe=resource_id).get()
            if obj:
                return obj.to_dto()
            else:
                self.abort(422, "Unknown id")
        else:
            cursor_key = self.request.GET.pop('$cursor', None)
            limit = max(min(200, self.request.GET.pop('$limit', 200)), 10)
            qs = self.MODEL.query()
            # ... other code that handles query params
            results, next_cursor, more = qs.fetch_page(limit, start_cursor=cursor)
            return {
                '$cursor': next_cursor.urlsafe() if more else None,
                'results': [result.to_dto() for result in results],
            }

class UserHandler(RestHandler):
    MODEL = User
查看更多
登录 后发表回答