I am trying to import a file (json.txt) from cloud storage to Bigquery via the api and have errors thrown. When this is done via the web ui, it works and has no errors (I even set maxBadRecords=0). Could someone please tell me what Im doing incorrectly here? Is the code wrong, or do I need to change some settings in Bigquery somewhere?
The file is a plain text utf-8 file with contents as follows: Ive kept to the docs on bigquery and json imports.
{"person_id":225,"person_name":"John","object_id":1}
{"person_id":226,"person_name":"John","object_id":1}
{"person_id":227,"person_name":"John","object_id":null}
{"person_id":229,"person_name":"John","object_id":1}
and on importing the job throws the following error: "Value cannot be converted to expected type." for every single line.
{
"reason": "invalid",
"location": "Line:15 / Field:1",
"message": "Value cannot be converted to expected type."
},
{
"reason": "invalid",
"location": "Line:16 / Field:1",
"message": "Value cannot be converted to expected type."
},
{
"reason": "invalid",
"location": "Line:17 / Field:1",
"message": "Value cannot be converted to expected type."
},
{
"reason": "invalid",
"location": "Line:18 / Field:1",
"message": "Value cannot be converted to expected type."
},
{
"reason": "invalid",
"message": "Too many errors encountered. Limit is: 10."
}
]
},
"statistics": {
"creationTime": "1384484132723",
"startTime": "1384484142972",
"endTime": "1384484182520",
"load": {
"inputFiles": "1",
"inputFileBytes": "960",
"outputRows": "0",
"outputBytes": "0"
}
}
}
The file can be accessed here: http://www.sendspace.com/file/7q0o37
and my code and schema are as follows:
def insert_and_import_table_in_dataset(tar_file, table, dataset=DATASET)
config= {
'configuration'=> {
'load'=> {
'sourceUris'=> ["gs://test-bucket/#{tar_file}"],
'schema'=> {
'fields'=> [
{ 'name'=>'person_id', 'type'=>'INTEGER', 'mode'=> 'nullable'},
{ 'name'=>'person_name', 'type'=>'STRING', 'mode'=> 'nullable'},
{ 'name'=>'object_id', 'type'=>'INTEGER', 'mode'=> 'nullable'}
]
},
'destinationTable'=> {
'projectId'=> @project_id.to_s,
'datasetId'=> dataset,
'tableId'=> table
},
'sourceFormat' => 'NEWLINE_DELIMITED_JSON',
'createDisposition' => 'CREATE_IF_NEEDED',
'maxBadRecords'=> 10,
}
},
}
result = @client.execute(
:api_method=> @bigquery.jobs.insert,
:parameters=> {
#'uploadType' => 'resumable',
:projectId=> @project_id.to_s,
:datasetId=> dataset},
:body_object=> config
)
# upload = result.resumable_upload
# @client.execute(upload) if upload.resumable?
puts result.response.body
json = JSON.parse(result.response.body)
while true
job_status = get_job_status(json['jobReference']['jobId'])
if job_status['status']['state'] == 'DONE'
puts "DONE"
return true
else
puts job_status['status']['state']
puts job_status
sleep 5
end
end
end
Could someone please tell me what I am doing wrong? What do I fix and where?
Also at some point in the future, I expect to be using compressed files and importing from them- is the "tar.gz" ok for that or do I need to make it a ".gz" only?
Thank you in advance for all help. Appreciate it.