In short: how can I configure bulkloader to insert data into 2 models with references?
I have a person and fruit class, with person linking to fruit:
class Fruit(db.Model):
name = db.StringProperty()
class Person(db.Model):
name = db.StringProperty()
customer = db.ReferenceProperty(Fruit)
And I want to upload this CSV data:
Name,Fruit
Bob,Banana
Joe,Apple
Tim,Banana
I tried using create_foreign_key as in the docs:
transformers:
- kind: fruit
connector: csv
property_map:
- property: fruit
external_name: Fruit
- kind: person
connector: csv
connector_options:
encoding: utf-8
columns: from_header
property_map:
- property: title
external_name: Name
- property: fruit
external_name: Fruit
import_transform: transform.create_foreign_key('fruit')
When I run the command:
appcfg.py upload_data --config_file=bulkloader.yaml --filename=food.csv --kind=person .
The persons are uploaded and they have foreign keys for the fruit, but the fruits entities they point to do not exist.
When I try --kind=fruit
the fruit are uploaded, but there are many duplicates.
I am trying to link the person to fruit, with no duplicate fruit - is this possible through bulkloader?
Sure.
The basic problem is that there's a step missing. You have a fruit name, what you want to store a reference to is a fruit key. You can accomplish this in a few ways.
If Banana
or Apple
is a permanent, unique identifier for a fruit, you can use transform.create_foreign_key('Fruit')
. This will give you a fruit key where the fruit name is the key name. Persons will be uploaded pointing to Fruit entities which don't exist, which is fine. Just upload fruit using the same import transform on the __key__
property to create the corresponding entities.
If you don't want to use fruit name as the fruit key name, you'd need to do some more complex post-import processing. You can write a post_import_function
that queries for fruit by name to see if a matching entity already exists, creates one if not, and then sets a reference to it on the newly created person entity.
It is possible with a post_import_function.
In your model, do not import a foreign key. Instead, add a post_import_function that looks like:
def fkeyLocation(input_dict, entity_instance, bulkload_state):
entity_instance.availableAt = Location.all().filter('name = ',input_dict['availableAt']).get().key()
return entity_instance
The trick is to do the lookup with the input_dict. If you're using polymodels, you can't use the auto-generated "kind" from the wizard, you have to use the model.modelName from the example code here.
I didn't figure out how to do this cleanly so ended up just splitting my data into multiple files and pregenerating the ID's.