I'm using app engine's bulkloader
to import a CSV file into my datastore. I've got a number of columns that I want to merge into one, for example they're all URLs, but not all of them are supplied and there is a superseding order, eg:
url_main
url_temp
url_test
I want to say: "Ok, if url_main
exists, use that, otherwise user url_test
and then use url_temp
"
Is it, therefore, possible to create a custom import transform that references columns and merges them into one based on conditions?
Ok, so after reading https://developers.google.com/appengine/docs/python/tools/uploadingdata#Configuring_the_Bulk_Loader I learnt about import_transform
and that this can use custom functions.
With that in mind, this pointed me the right way:
... a two-argument function with the keyword argument bulkload_state,
which on return contains useful information about the entity:
bulkload_state.current_entity, which is the current entity being
processed; bulkload_state.current_dictionary, the current export
dictionary ...
So, I created a function that handled two variables, one would be the value
of the current entity and the second would be the bulkload_state
that allowed me to fetch the current row, like so:
def check_url(value, bulkload_state):
row = bulkload_state.current_dictionary
fields = [ 'Final URL', 'URL', 'Temporary URL' ]
for field in fields:
if field in row:
return row[ field ]
return None
All this does is grab the current row (bulkload_state.current_dictionary
) and then checks which URL fields exist, otherwise it just returns None
.
In my bulkloader.yaml
I call this function simply by setting:
- property: business_url
external_name: URL
import_transform: bulkloader_helper.check_url
Note: the external_name
doesn't matter, as long as it exists as I'm not actually using it, I'm making use of multiple columns.
Simples!