This problem has been successfully resolved. I am editing my post to document my experience for posterity and future reference.
The Task
I have 117 PDF files (average size ~238 KB) uploaded to Google Drive. I want to convert them all to Google Docs and keep them in a different Drive folder.
The Problem
I attempted to convert the files using Drive.Files.insert. However, under most circumstances, only 5 files could be converted this way before the function expires prematurely with this error
Limit Exceeded: DriveApp. (line #, file "Code")
where the line referenced above is when the insert
function is called. After calling this function for the first time, subsequent calls typically failed immediately with no additional google doc created.
Approach
I used 3 main ways to achieve my goal. One was using the Drive.Files.insert, as mentioned above. The other two involved using Drive.Files.copy and sending a batch of HTTP requests. These last two methods were suggested by Tanaike, and I recommend reading his answer below for more information. The insert
and copy
functions are from Google Drive REST v2 API, while batching multiple HTTP requests is from Drive REST v3.
With Drive.Files.insert, I experienced issues dealing with execution limitations (explained in the Problem section above). One solution was to run the functions multiple times. And for that, I needed a way to keep track of which files were converted. I had two options for this: using a spreadsheet and a continuation token. Therefore, I had 4 different methods to test: the two mentioned in this paragraph, batching HTTP requests, and calling Drive.Files.copy.
Because team drives behave differently from regular drives, I felt it necessary to try each of those methods twice, one in which the folder containing the PDFs is a regular non-Team Drive folder and one in which that folder is under a Team Drive. In total, this means I had 8 different methods to test.
These are the exact functions I used. Each of these was used twice, with the only variations being the ID of the source and destination folders (for reasons stated above):
Method A: Using Drive.Files.insert and a spreadsheet
function toDocs() {
var sheet = SpreadsheetApp.openById(/* spreadsheet id*/).getSheets()[0];
var range = sheet.getRange("A2:E118");
var table = range.getValues();
var len = table.length;
var resources = {
title: null,
mimeType: MimeType.GOOGLE_DOCS,
parents: [{id: /* destination folder id */}]
};
var count = 0;
var files = DriveApp.getFolderById(/* source folder id */).getFiles();
while (files.hasNext()) {
var blob = files.next().getBlob();
var blobName = blob.getName();
for (var i=0; i<len; i++) {
if (table[i][0] === blobName.slice(5, 18)) {
if (table[i][4])
break;
resources.title = blobName;
Drive.Files.insert(resources, blob); // Limit Exceeded: DriveApp. (line 51, file "Code")
table[i][4] = "yes";
}
}
if (++count === 10) {
range.setValues(table);
Logger.log("time's up");
}
}
}
Method B: Using Drive.Files.insert and a continuation token
function toDocs() {
var folder = DriveApp.getFolderById(/* source folder id */);
var sprop = PropertiesService.getScriptProperties();
var contToken = sprop.getProperty("contToken");
var files = contToken ? DriveApp.continueFileIterator(contToken) : folder.getFiles();
var options = {
ocr: true
};
var resource = {
title: null,
mimeType: null,
parents: [{id: /* destination folder id */}]
};
while (files.hasNext()) {
var blob = files.next().getBlob();
resource.title = blob.getName();
resource.mimeType = blob.getContentType();
Drive.Files.insert(resource, blob, options); // Limit Exceeded: DriveApp. (line 113, file "Code")
sprop.setProperty("contToken", files.getContinuationToken());
}
}
Method C: Using Drive.Files.copy
Credit for this function goes to Tanaike -- see his answer below for more details.
function toDocs() {
var sourceFolderId = /* source folder id */;
var destinationFolderId = /* destination folder id */;
var files = DriveApp.getFolderById(sourceFolderId).getFiles();
while (files.hasNext()) {
var res = Drive.Files.copy({parents: [{id: destinationFolderId}]}, files.next().getId(), {convert: true, ocr: true});
Logger.log(res)
}
}
Method D: Sending batches of HTTP requests
Credit for this function goes to Tanaike -- see his answer below for more details.
function toDocs() {
var sourceFolderId = /* source folder id */;
var destinationFolderId = /* destination folder id */;
var files = DriveApp.getFolderById(sourceFolderId).getFiles();
var rBody = [];
while (files.hasNext()) {
rBody.push({
method: "POST",
endpoint: "https://www.googleapis.com/drive/v3/files/" + files.next().getId() + "/copy",
requestBody: {
mimeType: "application/vnd.google-apps.document",
parents: [destinationFolderId]
}
});
}
var cycle = 20; // Number of API calls at 1 batch request.
for (var i = 0; i < Math.ceil(rBody.length / cycle); i++) {
var offset = i * cycle;
var body = rBody.slice(offset, offset + cycle);
var boundary = "xxxxxxxxxx";
var contentId = 0;
var data = "--" + boundary + "\r\n";
body.forEach(function(e){
data += "Content-Type: application/http\r\n";
data += "Content-ID: " + ++contentId + "\r\n\r\n";
data += e.method + " " + e.endpoint + "\r\n";
data += e.requestBody ? "Content-Type: application/json; charset=utf-8\r\n\r\n" : "\r\n";
data += e.requestBody ? JSON.stringify(e.requestBody) + "\r\n" : "";
data += "--" + boundary + "\r\n";
});
var options = {
method: "post",
contentType: "multipart/mixed; boundary=" + boundary,
payload: Utilities.newBlob(data).getBytes(),
headers: {'Authorization': 'Bearer ' + ScriptApp.getOAuthToken()},
muteHttpExceptions: true,
};
var res = UrlFetchApp.fetch("https://www.googleapis.com/batch", options).getContentText();
// Logger.log(res); // If you use this, please remove the comment.
}
}
What Worked and What Didn't
None of the functions using Drive.Files.insert worked. Every function using
insert
for conversion failed with this errorLimit Exceeded: DriveApp. (line #, file "Code")
(line number replaced with generic symbol). No further details or description of the error could be found. A notable variation was one in which I used a spreadsheet and the PDFs were in a team drive folder; while all other methods failed instantly without converting a single file, this one converted 5 before failing. However, when considering why this variation did better than the others, I think it was more of a fluke than any reason related to the use of particular resources (spreadsheet, team drive, etc.)
Using Drive.Files.copy and batch HTTP requests worked only when the source folder was a personal (non-Team Drive) folder.
Attempting to use the
copy
function while reading from a Team Drive folder fails with this error:File not found: 1RAGxe9a_-euRpWm3ePrbaGaX5brpmGXu (line #, file "Code")
(line number replaced with generic symbol). The line being referenced is
var res = Drive.Files.copy({parents: [{id: destinationFolderId}]}, files.next().getId(), {convert: true, ocr: true});
Using batch HTTP requests while reading from a Team Drive folder does nothing -- no doc files are created and no errors are thrown. Function silently terminates without having accomplished anything.
Conclusion
If you wish to convert a large number of PDFs to google docs or text files, then use Drive.Files.copy or send batches of HTTP requests and make sure that the PDFs are stored in a personal drive rather than a Team Drive.
Special thanks to @tehhowch for taking such an avid interest in my question and for repeatedly coming back to provide feedback, and to @Tanaike for providing code along with explanations that successfully solved my problem (with a caveat, read above for details).