Get all nested Text Elements in a Google Doc using

2019-07-29 13:09发布

问题:

In a document similar to the above, I can get all the paragraphs with the following code:

var paras = body.getParagraphs();

Notice that the code above not only returns the top level paragraphs but also returns all the sub-level paragraphs inside ListItems, Tables etc.

How can I do the same thing within a selected range? Following code only returns top level elements.

const selection = DocumentApp.getActiveDocument().getSelection();
var rangeElements = selection.getRangeElements();

For example, the table above contains 9 non-empty paragraphs and I'd like to process them one by one if they are in selection.

What I'm trying to achieve is similar to translating the text in a selection by preserving the formatting, tables, list items etc. as much as possible.

回答1:

.getRangeElements() returns an array of RangeElements. A range element is a wrapper object that is used to help us deal with partial selections. We can call .getElement() on each item in this array to get the Element object which is a very generic object that can represent almost any piece of a Google Doc. Elements have a .getType() method that return an ElementType enum; and there are a lot of them!


Let's use what we know so far to see what the possible types are in a Google Doc (I've created one similar to yours (img) as an example):

function selectionHasWhichTypes() {
  var doc = DocumentApp.getActiveDocument();
  var selection = doc.getSelection();
  var rangeElems = selection.getRangeElements();

  rangeElems.forEach(function(elem){
    var elem = elem.getElement();

    Logger.log(elem.getType());
  });
}

//Logger OUTPUT:
PARAGRAPH
PARAGRAPH
PARAGRAPH
PARAGRAPH
PARAGRAPH
LIST_ITEM
LIST_ITEM
LIST_ITEM
PARAGRAPH
PARAGRAPH
PARAGRAPH
TABLE
PARAGRAPH

Ah Ha! It looks like we only have to deal with PARAGRAPH, LIST_ITEM, and TABLE ElementTypes for now, but let's keep their children in mind too (We will find out that these are 3 of 5 that can have children). This sounds like a job for a recursive function that will continually dig down into child elements until we've found and dealt with them all.


So let's try that. This next part may look confusing but essentially it is finding an element, checking if it has children, then looking at those to see if they have children, and so on. We also want to check if we are getting new ElementTypes to deal with as well...

function selectionHasWhichTypes() {
  var doc = DocumentApp.getActiveDocument();
  var selection = doc.getSelection();
  var rangeElems = selection.getRangeElements();

  rangeElems.forEach(function(elem){
    var elem = elem.getElement();

    elemsHaveWhatChildElems(elem, elem.getType());

  });
}

function elemsHaveWhatChildElems(elem, typeChain){
  var elemType = elem.getType();
  if(elemType == "TABLE" || elemType == "LIST_ITEM" || elemType == "PARAGRAPH"){ //Lets see if element is one of our basic 3. If so they could have children.
    var numChildren = elem.getNumChildren(); //How many children are there?
    if(numChildren > 0){
      for(var i = 0; i < numChildren; i++){ //Let's go through them.
        var child = elem.getChild(i);
        elemsHaveWhatChildElems(child, typeChain + "." + child.getType()); //Recursion step to look for more children.
      }
    }else{
       Logger.log(typeChain); //Let's log the chain of Parent to Child elements.
    }
  }else{
    Logger.log("*" + typeChain); //Let's mark the new elemTypeChains we have not seen.
  }
}

//Logger OUTPUT:
*PARAGRAPH.TEXT
PARAGRAPH
*PARAGRAPH.HORIZONTAL_RULE
PARAGRAPH
*PARAGRAPH.TEXT
*LIST_ITEM.TEXT
*LIST_ITEM.TEXT
*LIST_ITEM.TEXT
PARAGRAPH
*PARAGRAPH.TEXT
PARAGRAPH
*TABLE.TABLE_ROW
*TABLE.TABLE_ROW
PARAGRAPH

Alright, so each line of the log is a chain of Elements and their children. We have some new ElementTypes (HORIZONTAL_RULE, TABLE_ROW, and TEXT). If a chain is only a Paragraph and has no children, indicated by 'PARAGRAPH.' we can ignore it as it is a blank line. We can also ignore HORIZONTAL_RULE as this obviously won't contain text.

If we have gotten to a TEXT Element it means we can perform our function (ie. for OP it would be a translation) like we have done with LIST_ITEMs and PARAGRAPHs. However, we still have to deal with TableRow Objects (which logs like this: TABLE.TABLE_ROW). This is similar to our main 3 elements and can be used with our if(elemType == "TABLE" || elemType == "LIST_ITEM" || elemType == "PARAGRAPH") which changes to if(elemType == "TABLE" || elemType == "LIST_ITEM" || elemType == "PARAGRAPH" || elemType == "TABLE_ROW").

This gives us another new Element in our chain; TableCell (logs like: TABLE.TABLE_ROW.TABLE_CELL), which we can again add to our if statement making it: if(elemType == "TABLE" || elemType == "LIST_ITEM" || elemType == "PARAGRAPH" || elemType == "TABLE_ROW" || elemType == "TABLE_CELL")


Time to see what happens when we've dealt with Table ElementTypes.

function selectionHasWhichtypeChains() {
  var doc = DocumentApp.getActiveDocument();
  var selection = doc.getSelection();
  var rangeElems = selection.getRangeElements();

  rangeElems.forEach(function(elem){
    var elem = elem.getElement();

    elemsHaveWhatChildElems(elem, elem.getType());

  });
}

function elemsHaveWhatChildElems(elem, typeChain){
  var elemType = elem.getType();
  if(elemType == "TABLE" || elemType == "LIST_ITEM" || elemType == "PARAGRAPH" || elemType == "TABLE_ROW" || elemType == "TABLE_CELL"){ //Lets see if element is one of our basic 5 if so they could have children.
    var numChildren = elem.getNumChildren(); //How many children are there?
    if(numChildren > 0){
      for(var i = 0; i < numChildren; i++){ //Let's go through them.
        var child = elem.getChild(i);
        elemsHaveWhatChildElems(child, typeChain + "." + child.getType()); //Recursion step to look for more children.
      }
    }else{
       Logger.log(typeChain); //Let's log the chain of Parent to Child elements.
    }
  }else{
    Logger.log("*" + typeChain); //Let's mark the new elemTypeChains we have not seen.
  }
}

//Logger OUTPUT:
*PARAGRAPH.TEXT
PARAGRAPH
*PARAGRAPH.HORIZONTAL_RULE
PARAGRAPH
*PARAGRAPH.TEXT
*LIST_ITEM.TEXT
*LIST_ITEM.TEXT
*LIST_ITEM.TEXT
PARAGRAPH
*PARAGRAPH.TEXT
PARAGRAPH
*TABLE.TABLE_ROW.TABLE_CELL.PARAGRAPH.TEXT
*TABLE.TABLE_ROW.TABLE_CELL.TABLE.TABLE_ROW.TABLE_CELL.PARAGRAPH.TEXT
*TABLE.TABLE_ROW.TABLE_CELL.TABLE.TABLE_ROW.TABLE_CELL.PARAGRAPH.TEXT
*TABLE.TABLE_ROW.TABLE_CELL.TABLE.TABLE_ROW.TABLE_CELL.PARAGRAPH.TEXT
*TABLE.TABLE_ROW.TABLE_CELL.TABLE.TABLE_ROW.TABLE_CELL.PARAGRAPH.TEXT
TABLE.TABLE_ROW.TABLE_CELL.PARAGRAPH
*TABLE.TABLE_ROW.TABLE_CELL.PARAGRAPH.TEXT
*TABLE.TABLE_ROW.TABLE_CELL.PARAGRAPH.HORIZONTAL_RULE
*TABLE.TABLE_ROW.TABLE_CELL.PARAGRAPH.TEXT
*TABLE.TABLE_ROW.TABLE_CELL.PARAGRAPH.TEXT
PARAGRAPH

This is great! We've reached into the depths of every parent element and reached either a Text Element or a blank paragraph! From here we can slightly modify our code to add the functions that we want to perform while maintaining the structure of the document:

function myFunction() {
  var doc = DocumentApp.getActiveDocument();
  var selection = doc.getSelection();
  var rangeElems = selection.getRangeElements(); //Get main Elements of selection

  rangeElems.forEach(function(elem){ //Let's rn through each to find ALL of their children.
    var elem = elem.getElement(); //We have an ElementType. Let's get the full element.
    getNestedTextElements(elem, elem.getType()); //Time to go down the rabbit hole.
  });
}

function getNestedTextElements(elem, typeChain){
  var elemType = elem.getType();
  if(elemType == "TABLE" || elemType == "LIST_ITEM" || elemType == "PARAGRAPH" || elemType == "TABLE_ROW" || elemType == "TABLE_CELL"){ //Lets see if element is one of our basic 5, if so they could have children.
    var numChildren = elem.getNumChildren(); //How many children are there?
    if(numChildren > 0){
      for(var i = 0; i < numChildren; i++){ //Let's go through them.
        var child = elem.getChild(i);
        getNestedTextElements(child, typeChain + "." + child.getType()); //Recursion step to look for more children.
      }
    }
  }else if(elemType == "TEXT"){
    //THIS IS WHERE WE CAN PERFORM OUR OPERATIONS ON THE TEXT ELEMENT
    var text = elem.getText();


  }else{
    Logger.log("*" + typeChain); //Let's log the new elem we dont deal with now - for future proofing.
  }
}

BOOM! Done. I know this is a really long post, but I've broken down each section of the solution into parts to help new Apps Script coders understand the structure of a Selection (and Document Body, I guess) and how to modify it when the structure is very complicated (many nested Elements). I really hope this was helpful. If anybody sees a piece that can be improved, let me know.


As a note to OP: Be warned that this doesn't necessarily deal with partial selections of an Element, but that can easily be dealt with by modifying the first function a little to check for isPartial() on the RangeElement.