Given a "normal document" in Google Docs/Drive (e.g. paragraphs, lists, tables) which contains external links scattered throughout the content, how do you compile a list of links present using Google Apps Script?
Specifically, I want to update all broken links in the document by searching for oldText in each url and replace it with newText in each url, but not the text.
I don't think the replacing text section of the Dev Documentation is what I need -- do I need to scan every element of the doc? Can I just editAsText and use an html regex? Examples would be appreciated.
I offer another, shorter answer for your first question, concerning iterating through all links in a document's body. This instructive code returns a flat array of links in the current document's body, where each link is represented by an object with entries pointing to the text element (
text
), the paragraph element or list item element in which it's contained (paragraph
), the offset index in the text where the link appears (startOffset
) and the URL itself (url
). Hopefully, you'll find it easy to suit it for your own needs.It uses the
getTextAttributeIndices()
method rather than iterating over every character of the text, and is thus expected to perform much more quickly than previously written answers.EDIT: Since originally posting this answer, I modified the function a couple of times. It now also (1) includes the
endOffsetInclusive
property for each link (note that it can benull
for links that extend to the end of the text element - in this case one can uselink.text.length-1
instead); (2) finds links in all sections of the document, not only the body, and (3) includes thesection
andisFirstPageSection
properties to indicate where the link is located; (4) accepts the argumentmergeAdjacent
, which when set to true, will return only a single link entry for a continuous stretch of text linked to the same URL (which would be considered separate if, for instance, part of the text is styled differently than another part).For the purpose of including links under all sections, a new utility function,
iterateSections()
, was introduced.This is only mostly painful! Code is available as part of a gist.
Yeah, I can't spell.
getAllLinks
Here's a utility function that scans the document for all LinkUrls, returning them in an array.
findAndReplaceLinks
This utility builds on
getAllLinks
to do a find & replace function.Demo UI
To demonstrate the use of these utilities, here are a couple of UI extensions:
I was playing around and incorporated @Mogsdad's answer -- here's the really complicated version:
And I'm including the "extra" utility classes for creating menus, sidebars, etc below for completeness:
Had some trouble getting Mogsdad's solution to work. Specifically it misses links which end their parent element so there isn't a trailing non-link character to terminate it. I've implemented something which addresses this and returns a standard range element. Sharing here incase someone finds it useful.
You are right ... search and replace is not applicable here. Use setLinkUrl() https://developers.google.com/apps-script/reference/document/container-element#setLinkUrl(String)
Basically you have to iterate through the elements recursively (elements can contain elements) and for each use getLinkUrl() to get the oldText if not null , setLinkUrl(newText) .... leaves displayed text unchanged