[EDIT 20160426: This bug appears to have been solved now!]
[EDIT 20160219: Updated this question again, to reflect different error messages. See also the bug report I filed.]
We have a datastore table that contains a field category
, of type Category
, which is a custom class. The problem arises when we try to load this table into BigQuery (from a datastore backup). The resulting table should contain (simplified):
category.subfield1
,category.subfield2
,category.subfield3.subsubfield1
,category.subfield4
,category.subfield5
Instead, BigQuery wreaks havoc on the category field:
category_1.record.subfield1
,category_1.record.subfield2
,category_1.record.subfield3.subsubfield1
,category_1.entity.subfield1
,category_1.entity.subfield1
,category_1.entity.subfield3.subsubfield1
,category_1.entity.subfield4
,category_1.entity.subfield5
,category_1.provided
(Omitting a dozen of __key__
-subfields for reasons of exposition.)
Before 20160219, the garbled output of the category
-field was even worse, but there was a workaround: explicitly listing all the fields, including category
, through the option projection_fields
. Now that is no longer possible, since it results in a different error message: Field:category [...] Entity was of unexpected kind "__record__"
Original job-ids:
project id: 711939958575
without projection_fields: job_Qw6-ygtZNFJ-Y7W0uLEqdvOrO_8
with projection_fields: job_lzzXo92lud9r5kvW7Z1kuzFLxS4
We came accross the same problem when loading backups from datastore into BigQuery. We had an 'Order' Entity in which we had a nested entity 'Customer'. Ever since we added an index on one of the fields in the nested entity 'Customer', we would be getting the "Non-repeated field already set" error from BigQuery.
The reason was that setting an index on a field in the nested entity (e.g. Index on field email in Customer) created an index on the Order entity called customer.email. When loading data into BigQuery this results in two fields called customer.email, one from the nested Entity and one from the index.
The solution for us was to remove indices on nested Entities, in order to avoid these conflicts while loading datastore backups into BigQuery. Unfortunately we did have to remove all existing records in database, which for us wasn't a big problem, but alternatively you would have to make sure the Index is properly removed.