Using the google drive API to download a spreadshe

2019-01-07 06:08发布

问题:

I'm sorry if this is an obvious question, I'm still pretty new to the API. I'm using the python drive api library, and trying to download a google spreadsheet as a csv.
When I used files.get, it spat out a file with no downloadUrl, and with no 'text/csv' key in the export links field.
If it's not possible, I can find a workaround, but I'm hoping it is, since it is possible to do manually (file->download_as->csv)

Do I need to use the google document list api?

thanks, Matt

回答1:

Update: I have posted another answer that works with the Spreadsheets v4 API.

Old Answer:

The answer from Alain is correct, but you also need to set the gid=parameter to specify which worksheet to export.

For example, if your 'application/pdf' export link is like this:

docs.google.com/feeds/download/spreadsheets/Export?key=<FILE_ID>&exportFormat=pdf

You can just change it to this to download the first worksheet:

docs.google.com/feeds/download/spreadsheets/Export?key<FILE_ID>&exportFormat=csv&gid=0

There is a bit of a problem, though as there is no reliable way to get the gid for a given worksheet through the API and they are not zero based indexes. If you delete a worksheet, that gid does not get reused. You can see the gid in the URL in your browser though, so if your worksheet information is constant you can just get that from there. See http://code.google.com/a/google.com/p/apps-api-issues/issues/detail?id=1813 and http://code.google.com/a/google.com/p/apps-api-issues/issues/detail?id=3240 for more info on that problem.



回答2:

The exportLinks collection doesn't expose the CSV format as this will only export the first worksheet of a spreadsheet. If retrieving the first worksheet as a CSV is the behavior you are looking for, you can build the link manually and set the ?exportFormat= query parameter to ?exportFormat=csv.



回答3:

Here's an implementation of Alain's suggestion that works for me:

downloadUrl = entry.get('exportLinks')['application/pdf']
# Strip "=pdf" and replace with "=csv"
downloadUrl = downloadUrl[:-4] + "=csv"

resp, content = drive_service._http.request(downloadUrl)


回答4:

Not sure if it's what the OP needed, but in the new Google Sheets version it seems that it became a little hard to hot link a csv version of your spreadsheet.

In case you are interested in a Google apps script that will export all sheets in a spreadsheet to individual csv files (instead of downloading each one individually), Here you go:

function onOpen() {
  var ss = SpreadsheetApp.getActiveSpreadsheet();
  var csvMenuEntries = [{name: "export as csv files", functionName: "saveAsCSV"}];
  ss.addMenu("csv", csvMenuEntries);
};

function saveAsCSV() {
  var ss = SpreadsheetApp.getActiveSpreadsheet();
  var sheets = ss.getSheets();
  // create a folder from the name of the spreadsheet
  var folder = DocsList.createFolder(ss.getName().toLowerCase().replace(/ /g,'_') + '_csv_' + new Date().getTime());
  for (var i = 0 ; i < sheets.length ; i++) {
    var sheet = sheets[i];
    // append ".csv" extension to the sheet name
    fileName = sheet.getName() + ".csv";
    // convert all available sheet data to csv format
    var csvFile = convertRangeToCsvFile_(fileName, sheet);
    // create a file in the Docs List with the given name and the csv data
    folder.createFile(fileName, csvFile);
  }
  Browser.msgBox('Files are waiting in a folder named ' + folder.getName());
}

function convertRangeToCsvFile_(csvFileName, sheet) {
  // get available data range in the spreadsheet
  var activeRange = sheet.getDataRange();
  try {
    var data = activeRange.getValues();
    var csvFile = undefined;

    // loop through the data in the range and build a string with the csv data
    if (data.length > 1) {
      var csv = "";
      for (var row = 0; row < data.length; row++) {
        for (var col = 0; col < data[row].length; col++) {
          if (data[row][col].toString().indexOf(",") != -1) {
            data[row][col] = "\"" + data[row][col] + "\"";
          }
        }

        // join each row's columns
        // add a carriage return to end of each row, except for the last one
        if (row < data.length-1) {
          csv += data[row].join(",") + "\r\n";
        }
        else {
          csv += data[row];
        }
      }
      csvFile = csv;
    }
    return csvFile;
  }
  catch(err) {
    Logger.log(err);
    Browser.msgBox(err);
  }
}

Note: This script uses the DocsList.createFile() method, which is only available for Google Apps accounts.

If you need further explanation, go here: http://drzon.net/export-all-google-sheets-to-csv/



回答5:

Note, as of April 2015 DocsList was depreciated, and has been replaced by DriveApp. Many of the DriveApp methods are identical to DocsList. So, in many cases, you can simply replace DocsList with DriveApp. So replace DocsList.createFile() with DriveApp.createFile()

How to update DocsList to DriveApp in my code



回答6:

(Jul 2016) This question is phrased correctly, but in essence is a duplicate of another thread (Download a spreadsheet from Google Docs using Python). While some of the previous answers to this question below may still work (although answers are in JS/Apps Script not Python), a new Drive API version (v3) and new Sheets API version (v4) make them slightly outdated although the previous versions of both have not been deprecated (yet). Modern Google API access occurs using API keys or OAuth2 authorization, primarily with the Google APIs Client Libraries, including the one for Python.

To perform the task requested in/by the OP, you would perhaps query for specific Sheets to download, then perform the actual export(s) with the Drive API. Since this is likely a common operation, I wrote a blogpost sharing a code snippet that does this for you. If you wish to pursue exporting further, I've got another pair of posts along with a video that outlines how to upload files to and download files from Google Drive.

Note that there is also a Google Sheets API, but it's primarily for spreadsheet-oriented operations, i.e., inserting data, reading spreadsheet rows, cell formatting, creating charts, adding pivot tables, etc., not file-based requests like exporting where the Drive API is the correct one to use.



回答7:

As a lot of other people have pointed out, my original answer is somewhat outdated. So here is my answer updated for v4 of the Google Spreadsheets API. Now there's a way to get the gids, but we can't use the the drive files.export API because it only exports first worksheet in the spreadsheet (even if you specify the gid).

To export all of the worksheets as CSV files, you need to get the gids for the worksheets you want to export using the spreadsheets.get API. That API call returns a bunch of information about the spreadsheet including each of the worksheets. You can get the gid from the properties.sheetId property for each worksheet.

Once you have that, you can just build the same URL that the Sheets uses when you select File->Download As->CSV. You can take the data.spreadsheetUrl value from spreadsheets.get and replace /edit with /export and then add the gid as the parameter. You will also need to include Authorization Bearer <auth token> in the HTTP header in the request.

Here's a python script based on their quickstart example that downloads all of the sheets for the spreadsheet with a specified ID. You need to replace <spreadsheet id> with the ID for a spreadsheet you have access to:

import apiclient.discovery
import httplib2
import oauth2client
import re
import requests
import shutil
import urllib.parse

SCOPES = 'https://www.googleapis.com/auth/drive.readonly'
SPREADSHEET_ID = '<spreadsheet id>'

store = oauth2client.file.Storage('credentials.json')
creds = store.get()
if not creds or creds.invalid:
  flow = oauth2client.client.flow_from_clientsecrets('client_secret.json', SCOPES)
  creds = oauth2client.tools.run_flow(flow, store)

service = apiclient.discovery.build('sheets', 'v4', http=creds.authorize(httplib2.Http()))

result = service.spreadsheets().get(spreadsheetId = SPREADSHEET_ID).execute()
spreadsheetUrl = result['spreadsheetUrl']
exportUrl = re.sub("\/edit$", '/export', spreadsheetUrl)
headers = {
  'Authorization': 'Bearer ' + creds.access_token,
}
for sheet in result['sheets']:
  params = {
    'format': 'csv',
    'gid': sheet['properties']['sheetId'],
  } 
  queryParams = urllib.parse.urlencode(params)
  url = exportUrl + '?' + queryParams
  response = requests.get(url, headers = headers)
  filePath = '/tmp/foo-%s.csv' % (+ params['gid'])
  with open(filePath, 'wb') as csvFile:
    csvFile.write(response.content)