Merge multiple columns in bulkloader

2019-08-26 22:41发布

问题:

I'm using app engine's bulkloader to import a CSV file into my datastore. I've got a number of columns that I want to merge into one, for example they're all URLs, but not all of them are supplied and there is a superseding order, eg:

url_main
url_temp
url_test

I want to say: "Ok, if url_main exists, use that, otherwise user url_test and then use url_temp"

Is it, therefore, possible to create a custom import transform that references columns and merges them into one based on conditions?

回答1:

Ok, so after reading https://developers.google.com/appengine/docs/python/tools/uploadingdata#Configuring_the_Bulk_Loader I learnt about import_transform and that this can use custom functions.

With that in mind, this pointed me the right way:

... a two-argument function with the keyword argument bulkload_state, which on return contains useful information about the entity: bulkload_state.current_entity, which is the current entity being processed; bulkload_state.current_dictionary, the current export dictionary ...

So, I created a function that handled two variables, one would be the value of the current entity and the second would be the bulkload_state that allowed me to fetch the current row, like so:

def check_url(value, bulkload_state):
    row = bulkload_state.current_dictionary
    fields = [ 'Final URL', 'URL', 'Temporary URL' ]

    for field in fields:
        if field in row:
            return row[ field ]


    return None

All this does is grab the current row (bulkload_state.current_dictionary) and then checks which URL fields exist, otherwise it just returns None.

In my bulkloader.yaml I call this function simply by setting:

- property: business_url
  external_name: URL
  import_transform: bulkloader_helper.check_url

Note: the external_name doesn't matter, as long as it exists as I'm not actually using it, I'm making use of multiple columns.

Simples!