Error Loading Large CSV into Google BigQuery

2019-05-25 05:57发布

问题:

Getting an Error on loading a large CSV into bigquery. Everywhere I read online I see that there is a 5gb size limit on zipped files but no limits on CSV.

BigQuery error in load operation: Error processing job 'bqjob_r3016bbfad3037f_0000015cea1a1eff_1': Input CSV files are not splittable and at least one of the files is larger than the maximum allowed size. Size is: 24686861596. Max allowed size is: 4294967296.

回答1:

BigQuery documentation lists various limits for import jobs here: https://cloud.google.com/bigquery/quota-policy#import In particular it notes, that the limit of compressed CSV file is 4 GBs.

The error message about "not splittable" CSV file can come in two cases:

  1. CSV file was compressed
  2. There is a quoting character mismatch in one of the fields, which makes it look like very long string in that field, also making file not splittable (this is what likely happened in your case).


回答2:

Try this:

  • Turn off quoting
  • Set separating character to a non occurring character.

bq help load:

--quote: Quote character to use to enclose records. Default is ". To indicate no quote character at all, use an empty string.
-F,--field_delimiter: The character that indicates the boundary between columns in the input file. "\t" and "tab" are accepted names for tab.

This will import each CSV line to a one column table. Split afterwards within BigQuery (with REGEXP_EXTRACT(), SPLIT(), or JavaScript UDF).