Google sheets upload to bigquery: Bad character

2019-03-06 06:56发布

My apps script is modified from https://developers.google.com/apps-script/advanced/bigquery.

I keep getting errors like so:

File: 0 / Line:6 / Field:1, Bad character (ASCII 0) encountered: field starts with: < c7d��{>
File: 0 / Line:7 / Field:1, Bad character (ASCII 0) encountered: field starts with: < #7��v#>
File: 0 / Line:13 / Field:1, Bad character (ASCII 0) encountered: field starts with: < c7d�1>
File: 0 / Line:17 / Field:1, Bad character (ASCII 0) encountered: field starts with: <�+�d��>
File: 0 / Line:20 / Field:1, Bad character (ASCII 0) encountered: field starts with: <7�J�F�)�>

Here is my code:

function loadCsv() {
  // Replace this value with the project ID listed in the Google
  // Developers Console project.
  var projectId = 'summer-rope-861';
  // Create a dataset in the BigQuery UI (https://bigquery.cloud.google.com)
  // and enter its ID below.
  var datasetId = 'expedia_sem';
  // Sample CSV file of Google Trends data conforming to the schema below.
  // https://docs.google.com/file/d/0BwzA1Orbvy5WMXFLaTR1Z1p2UDg/edit
  var csvFileId = '1JjShLEoLmPME6vdYbIAQy7iF5EgpHLArcJ9jxwkH-sY';

  // Delete and reCreate the table.
  var tableId = 'aud_info';
  BigQuery.Tables.remove(projectId, datasetId, tableId);
  var table = {
    tableReference: {
      projectId: projectId,
      datasetId: datasetId,
      tableId: tableId
    },
    schema: {
      fields: [
        {name: 'ListID', type: 'STRING'},
        {name: 'ListName', type: 'STRING'},
        {name: 'ListDescription', type: 'STRING'},
        {name: 'AudienceCategory', type: 'STRING'}
      ]
    }
  };
  table = BigQuery.Tables.insert(table, projectId, datasetId);
  Logger.log('Table created: %s', table.id);

  // Load CSV data from Drive and convert to the correct format for upload.
  var file = DriveApp.getFileById(csvFileId);
  var data = file.getBlob().setContentType('application/octet-stream');

  // Create the data upload job.
  var job = {
    configuration: {
      load: {
        destinationTable: {
          projectId: projectId,
          datasetId: datasetId,
          tableId: tableId
        },
        skipLeadingRows: 1,
        allowJaggedRows: true
      }
    }
  };
  job = BigQuery.Jobs.insert(job, projectId, data);
  Logger.log('Load job started. Check on the status of it here: ' +
      'https://bigquery.cloud.google.com/jobs/%s', projectId);
}

What am I doing wrong? It looks like an encoding issue, but isn't Sheets already in a UTF-8 format?

My data looks like this:

enter image description here

When I run data.getDataAsString() to Logger, I get

%PDF-1.4 %���� 2 0 obj <>stream x���MO�@�缿bF$�a�?�FiC8X(b"����w$�PZ�if6��Ogf���0��Xo���I��?ע�P��aTYCѠs$-���O���$�O�d�&��S����B��-��Q2�?vZR�`nD��+�$�u��Y��(.7����6=��Ā�w8�ށ���K�dk-��JFJ��x���J���-)N%�L��-����[L� ��Nȣ��w0�9���a4���d��]��4Eu=G��@���V�w��FPr��K��aUݴ��GR�R�� �A��*F�p��9���ͦů�/���� endstream endobj 4 0 obj <>>>/Parent 3 0 R/MediaBox[0 0 612 792]>> endobj 5 0 obj <>stream x�� xE�?�VwO�=�3�̑L2�L.2�J��d�$ �}�`"A@9D�TT��� ���������1���U����]]V�^�u�U1���Vw�v�����{~O>��U]�U��[o�oUG��Ae����o}�9���/�{ɜW��]��q�e�...

2条回答
【Aperson】
2楼-- · 2019-03-06 07:27

Edit: Apparently, .getAs() won't do conversion between content types for Sheets. Here is a more involved way to do it, adapted from the tutorial here:

https://developers.google.com/apps-script/articles/docslist_tutorial#section-3-saving-a-selected-range-to-a-csv-file

I suspect the example you started with assumes that a file was uploaded as a CSV and retained its type (instead of being converted into Sheet format). You can work around this by walking the sheet via the API and building up a Blob containing CSV-formatted data.

If you started with a CSV file, there may be a more efficient approach (if you can avoid the conversion entirely).

For a test file, I imported the infectious disease rates CSV from here: https://cdph.data.ca.gov/api/views/yijp-bauh/rows.csv?accessType=DOWNLOAD

function myFunction() {
  var projectId = 'XXXXXXXXXXXXXXXXXXX';
  var fileId = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX';

  // Define our load job.
  var jobSpec = {
    configuration: {
      load: {
        destinationTable: {
          projectId: projectId,
          datasetId: 'Experiment',
          tableId: 'disease_rates'
        },
        allowJaggedRows: true,
        writeDisposition: 'WRITE_TRUNCATE',
        schema: {
          fields: [
            { name: 'Disease', type: 'STRING' },
            { name: 'County', type: 'STRING' },
            { name: 'Year', type: 'INTEGER' },
            { name: 'Sex', type: 'STRING' },
            { name: 'Count', type: 'INTEGER' },
            { name: 'Population', type: 'INTEGER' },
            { name: 'Rate', type: 'FLOAT' },
            { name: 'CI_lower', type: 'FLOAT' },
            { name: 'CI_upper', type: 'FLOAT' },
            { name: 'Unstable', type: 'STRING' }
          ]
        }
      }
    }
  };

  // Partially borrowed from: https://developers.google.com/apps-script/articles/docslist_tutorial#section-3-saving-a-selected-range-to-a-csv-file  
  var spreadsheet = SpreadsheetApp.openById(fileId);
  var filename = spreadsheet.getName();

  // TODO: only handles the first sheet and limited to MAX_ROWS.
  // TODO: does not escape embedded quotes.
  var MAX_ROWS = 5000;
  var sheet = spreadsheet.getSheets()[0]; 
  var data = sheet.getDataRange().getValues();
  var csvdata = "";
  for (var row = 1; row < data.length && row < MAX_ROWS + 1; row++) {
    for (var col = 0; col < data[row].length; col++) {
      var cell = data[row][col].toString();
      if (cell.indexOf(",") != -1) {
        csvdata += "\"" + cell + "\"";
      } else {
        csvdata += cell;
      }

      if (col < data[row].length - 1) {
        csvdata += ",";
      }
    }
    csvdata += "\r\n";
  }
  var data = Utilities.newBlob(csvdata, "application/octet-stream");

  // Execute the job.
  BigQuery.Jobs.insert(jobSpec, projectId, data);
}
查看更多
别忘想泡老子
3楼-- · 2019-03-06 07:30

It looks like the data you are getting back is a PDF rather than a Google Sheets spreadsheet. I don't have access to the document at https://docs.google.com/spreadsheets/d/1JjShLEoLmPME6vdYbIAQy7iF5EgpHLArcJ9jxwkH-sY/. I assume you do. Can you check that link and make sure it is the spreadsheet you intend to be reading from? If not, you need to correct the csvFileId variable in your example above.

查看更多
登录 后发表回答