This problem has been successfully resolved. I am editing my post to document my experience for posterity and future reference.
The Task
I have 117 PDF files (average size ~238 KB) uploaded to Google Drive. I want to convert them all to Google Docs and keep them in a different Drive folder.
The Problem
I attempted to convert the files using Drive.Files.insert. However, under most circumstances, only 5 files could be converted this way before the function expires prematurely with this error
Limit Exceeded: DriveApp. (line #, file "Code")
where the line referenced above is when the insert
function is called. After calling this function for the first time, subsequent calls typically failed immediately with no additional google doc created.
Approach
I used 3 main ways to achieve my goal. One was using the Drive.Files.insert, as mentioned above. The other two involved using Drive.Files.copy and sending a batch of HTTP requests. These last two methods were suggested by Tanaike, and I recommend reading his answer below for more information. The insert
and copy
functions are from Google Drive REST v2 API, while batching multiple HTTP requests is from Drive REST v3.
With Drive.Files.insert, I experienced issues dealing with execution limitations (explained in the Problem section above). One solution was to run the functions multiple times. And for that, I needed a way to keep track of which files were converted. I had two options for this: using a spreadsheet and a continuation token. Therefore, I had 4 different methods to test: the two mentioned in this paragraph, batching HTTP requests, and calling Drive.Files.copy.
Because team drives behave differently from regular drives, I felt it necessary to try each of those methods twice, one in which the folder containing the PDFs is a regular non-Team Drive folder and one in which that folder is under a Team Drive. In total, this means I had 8 different methods to test.
These are the exact functions I used. Each of these was used twice, with the only variations being the ID of the source and destination folders (for reasons stated above):
Method A: Using Drive.Files.insert and a spreadsheet
function toDocs() {
var sheet = SpreadsheetApp.openById(/* spreadsheet id*/).getSheets()[0];
var range = sheet.getRange("A2:E118");
var table = range.getValues();
var len = table.length;
var resources = {
title: null,
mimeType: MimeType.GOOGLE_DOCS,
parents: [{id: /* destination folder id */}]
};
var count = 0;
var files = DriveApp.getFolderById(/* source folder id */).getFiles();
while (files.hasNext()) {
var blob = files.next().getBlob();
var blobName = blob.getName();
for (var i=0; i<len; i++) {
if (table[i][0] === blobName.slice(5, 18)) {
if (table[i][4])
break;
resources.title = blobName;
Drive.Files.insert(resources, blob); // Limit Exceeded: DriveApp. (line 51, file "Code")
table[i][4] = "yes";
}
}
if (++count === 10) {
range.setValues(table);
Logger.log("time's up");
}
}
}
Method B: Using Drive.Files.insert and a continuation token
function toDocs() {
var folder = DriveApp.getFolderById(/* source folder id */);
var sprop = PropertiesService.getScriptProperties();
var contToken = sprop.getProperty("contToken");
var files = contToken ? DriveApp.continueFileIterator(contToken) : folder.getFiles();
var options = {
ocr: true
};
var resource = {
title: null,
mimeType: null,
parents: [{id: /* destination folder id */}]
};
while (files.hasNext()) {
var blob = files.next().getBlob();
resource.title = blob.getName();
resource.mimeType = blob.getContentType();
Drive.Files.insert(resource, blob, options); // Limit Exceeded: DriveApp. (line 113, file "Code")
sprop.setProperty("contToken", files.getContinuationToken());
}
}
Method C: Using Drive.Files.copy
Credit for this function goes to Tanaike -- see his answer below for more details.
function toDocs() {
var sourceFolderId = /* source folder id */;
var destinationFolderId = /* destination folder id */;
var files = DriveApp.getFolderById(sourceFolderId).getFiles();
while (files.hasNext()) {
var res = Drive.Files.copy({parents: [{id: destinationFolderId}]}, files.next().getId(), {convert: true, ocr: true});
Logger.log(res)
}
}
Method D: Sending batches of HTTP requests
Credit for this function goes to Tanaike -- see his answer below for more details.
function toDocs() {
var sourceFolderId = /* source folder id */;
var destinationFolderId = /* destination folder id */;
var files = DriveApp.getFolderById(sourceFolderId).getFiles();
var rBody = [];
while (files.hasNext()) {
rBody.push({
method: "POST",
endpoint: "https://www.googleapis.com/drive/v3/files/" + files.next().getId() + "/copy",
requestBody: {
mimeType: "application/vnd.google-apps.document",
parents: [destinationFolderId]
}
});
}
var cycle = 20; // Number of API calls at 1 batch request.
for (var i = 0; i < Math.ceil(rBody.length / cycle); i++) {
var offset = i * cycle;
var body = rBody.slice(offset, offset + cycle);
var boundary = "xxxxxxxxxx";
var contentId = 0;
var data = "--" + boundary + "\r\n";
body.forEach(function(e){
data += "Content-Type: application/http\r\n";
data += "Content-ID: " + ++contentId + "\r\n\r\n";
data += e.method + " " + e.endpoint + "\r\n";
data += e.requestBody ? "Content-Type: application/json; charset=utf-8\r\n\r\n" : "\r\n";
data += e.requestBody ? JSON.stringify(e.requestBody) + "\r\n" : "";
data += "--" + boundary + "\r\n";
});
var options = {
method: "post",
contentType: "multipart/mixed; boundary=" + boundary,
payload: Utilities.newBlob(data).getBytes(),
headers: {'Authorization': 'Bearer ' + ScriptApp.getOAuthToken()},
muteHttpExceptions: true,
};
var res = UrlFetchApp.fetch("https://www.googleapis.com/batch", options).getContentText();
// Logger.log(res); // If you use this, please remove the comment.
}
}
What Worked and What Didn't
None of the functions using Drive.Files.insert worked. Every function using
insert
for conversion failed with this errorLimit Exceeded: DriveApp. (line #, file "Code")
(line number replaced with generic symbol). No further details or description of the error could be found. A notable variation was one in which I used a spreadsheet and the PDFs were in a team drive folder; while all other methods failed instantly without converting a single file, this one converted 5 before failing. However, when considering why this variation did better than the others, I think it was more of a fluke than any reason related to the use of particular resources (spreadsheet, team drive, etc.)
Using Drive.Files.copy and batch HTTP requests worked only when the source folder was a personal (non-Team Drive) folder.
Attempting to use the
copy
function while reading from a Team Drive folder fails with this error:File not found: 1RAGxe9a_-euRpWm3ePrbaGaX5brpmGXu (line #, file "Code")
(line number replaced with generic symbol). The line being referenced is
var res = Drive.Files.copy({parents: [{id: destinationFolderId}]}, files.next().getId(), {convert: true, ocr: true});
Using batch HTTP requests while reading from a Team Drive folder does nothing -- no doc files are created and no errors are thrown. Function silently terminates without having accomplished anything.
Conclusion
If you wish to convert a large number of PDFs to google docs or text files, then use Drive.Files.copy or send batches of HTTP requests and make sure that the PDFs are stored in a personal drive rather than a Team Drive.
Special thanks to @tehhowch for taking such an avid interest in my question and for repeatedly coming back to provide feedback, and to @Tanaike for providing code along with explanations that successfully solved my problem (with a caveat, read above for details).
You can first of all fetch and store id of all files in a google sheet. Then you can proceed with processing each file normally by using it's id. Then after you have processed them mark that file as processed. And before processing a file check if that file is already processed.
If there are several files then you can also store the row number till where you have processed, next time continue after that.
Then at last create a trigger to execute your function every 10 minutes or so.
By this you can overcome execution time limit for single execution. API request quota and all will not be by-passed by this method.
You want to convert from PDF files in the folder to Google Documents. PDF files are in a folder of team drive. You want to import converted them to a folder of your Google Drive. If my understanding is correct, how about this method?
For the conversion from PDF to Google Document, it can convert using not only
Drive.Files.insert()
, but alsoDrive.Files.copy()
. The advantage of use ofDrive.Files.copy()
isDrive.Files.insert()
has the size limitation of 5 MB,Drive.Files.copy()
can use over the size of 5 MB.Drive.Files.insert()
.For this method, I would like to propose the following 2 patterns.
Pattern 1 : Using Drive API v2
In this case, Drive API v2 of Advanced Google Services is used for converting files.
Pattern 2 : Using Drive API v3
In this case, Drive API v3 is used for converting files. And here, I used the batch requests for this situation. Because the batch requests can use 100 API calls by one API call. By this, the issue of API quota can be removed.
Note :
var cycle = 100
.Reference :
If these are not useful for you, I'm sorry.