DocsList File getContentAsString() missing unicode

2020-03-29 07:54发布

问题:

I am trying to import a CSV file with french accents using Google App Script, reading the file using the getContentAsString() and then processing it into a Google Spreadsheet. It would seems the unicode characters are send back as garbage.

After investigation, it would seems getContentAsString() open files using UTF-8. This cause problems when the file is created using Western Mac OS Roman or Western Windows Latin 1 - default encoding on older Excel when exporting CSV.

Any suggestion on how to circumvent this problem?

Example: �quipement should be Équipement

function Test() {
  var filename = 'BV_period_2.csv';
  var files = DocsList.getFiles();
  var csvFile = "";

  for (var i = 0; i < files.length; i++) {
    if (files[i].getName() == filename ) {
      csvFile = files[i].getContentAsString(); //csvFile will have �     
      break;
    }
  }

  var csvData = CSVToArray(csvFile, ",");
  var ss = SpreadsheetApp.getActiveSpreadsheet();
  var sheet = ss.getSheetByName('TestBV');
  ...

回答1:

You can optionally choose the charset. Here's a UTF-16 example.

DocsList.getFileById(<some id>).getBlob().getDataAsString("UTF-16")