Uploading data with bulkloader

2019-07-07 11:20发布

问题:

In short: how can I configure bulkloader to insert data into 2 models with references?

I have a person and fruit class, with person linking to fruit:

class Fruit(db.Model): 
    name = db.StringProperty()
class Person(db.Model): 
    name = db.StringProperty() 
    customer = db.ReferenceProperty(Fruit)

And I want to upload this CSV data:

Name,Fruit
Bob,Banana
Joe,Apple
Tim,Banana

I tried using create_foreign_key as in the docs:

transformers:

- kind: fruit
  connector: csv
  property_map:
    - property: fruit
      external_name: Fruit

- kind: person
  connector: csv
  connector_options:
    encoding: utf-8
    columns: from_header
  property_map:
    - property: title
      external_name: Name
    - property: fruit
      external_name: Fruit
      import_transform: transform.create_foreign_key('fruit')

When I run the command:

appcfg.py upload_data --config_file=bulkloader.yaml --filename=food.csv --kind=person .

The persons are uploaded and they have foreign keys for the fruit, but the fruits entities they point to do not exist.

When I try --kind=fruit the fruit are uploaded, but there are many duplicates.

I am trying to link the person to fruit, with no duplicate fruit - is this possible through bulkloader?

回答1:

Sure.

The basic problem is that there's a step missing. You have a fruit name, what you want to store a reference to is a fruit key. You can accomplish this in a few ways.

If Banana or Apple is a permanent, unique identifier for a fruit, you can use transform.create_foreign_key('Fruit'). This will give you a fruit key where the fruit name is the key name. Persons will be uploaded pointing to Fruit entities which don't exist, which is fine. Just upload fruit using the same import transform on the __key__ property to create the corresponding entities.

If you don't want to use fruit name as the fruit key name, you'd need to do some more complex post-import processing. You can write a post_import_function that queries for fruit by name to see if a matching entity already exists, creates one if not, and then sets a reference to it on the newly created person entity.



回答2:

It is possible with a post_import_function.

In your model, do not import a foreign key. Instead, add a post_import_function that looks like:

def fkeyLocation(input_dict, entity_instance, bulkload_state):
   entity_instance.availableAt =  Location.all().filter('name = ',input_dict['availableAt']).get().key()

   return entity_instance

The trick is to do the lookup with the input_dict. If you're using polymodels, you can't use the auto-generated "kind" from the wizard, you have to use the model.modelName from the example code here.



回答3:

I didn't figure out how to do this cleanly so ended up just splitting my data into multiple files and pregenerating the ID's.