I need to download all instances of fairly large (multi-GB) entity in my app's datastore. I have enough disk space to store the entity's data, but not enough to store both the original data that the bulk downloader retrieves as an SQLite database and the processed version of the data that the downloader writes after applying the transforms specified in my bulkloader.yaml file. Given this, I'm fairly certain that the bulk download operation would successfully retrieve the SQLite database, and then fail when trying to apply the transforms.
This might be okay since there's another system available to which I could move the SQLite database and where I could unpack it. (The other system that's available to me has Python installed but not a version that supports the AppEngine tools -- and I don't have permission to upgrade Python on that machine -- so I cannot do the bulk download directly there.) I could retrieve the data I need if I could write some Python code to load the SQLite database and read its result table, but I cannot figure out what to make of the SQLite data -- when I use the SQLite module to connect to the database and unpack rows of the table, they appear to contain metadata in addition to the data that I'm interested in (the data that my AppEngine app actually placed in the datastore).
I know that the appcfg.py bulk download process can read this data, since it can transform the data in the ways I specify in bulkloader.yaml, but I haven't located the AppEngine toolkit code that does this unpacking. Any help or pointers would be appreciated.
Entities are stored in the downloaded SQLite database as encoded Protocol Buffers (the same as they're stored in the production environment, and everywhere else - an entity is an encoded PB, in short). You can read them out yourself by using the SDK code for decoding entities (
db.proto_to_entity()
etc), but it'll be a bit of work to set everything up.The relevant code is the ResultDatabase class in bulkloader.py - which you can probably reuse, along with other parts of the bulkloader, to make your job easier.
Here's the code that worked for me: