google doc script, attributes (bold, italics, unde

2019-08-08 19:46发布

问题:

i want to create a script to capitalize sentences in a google doc, but without changing existing attributes in certain words. for example, in a google doc, there would be several paragraphs, with each paragraph having several sentences. in such google doc, there would be hyperlinks, words in boldface, words in italics, words with underline, etc. i want all of these attributes to stay intact; the script should only capitalize the sentences, without removing the existing attributes for these words.

below is a code in which i tried to print out the attributes:

function cap12b() {

  // define function "replacement" to change the matched pattern to uppercase
  function replacement(match) { return match.toUpperCase(); }

  // define regex "period, followed by zero or any number 
  // of blank spaces, followed by any lowercase character"
  var regex1 = "(^|\.)(\s*)[a-z]";
  var regex2 = /(^|\.)(\s*)[a-z]/;
  Logger.log('regex1 -> %s, regex2 -> %s', regex1, regex2);

  // var body = DocumentApp.getActiveDocument().getBody().editAsText();
  var doc = DocumentApp.getActiveDocument();
  var body = doc.getBody();
  Logger.log('doc -> %s', doc);
  Logger.log('body -> %s', body);
  var atBody = body.getAttributes();
  Logger.log('atBody -> %s', atBody);

  // get body to edit as text
  // var body = body.editAsText();
  var idBody = body.editAsText().getTextAttributeIndices();
  Logger.log("idBody -> %s", idBody);

  // get element text matching pattern "regex"
  var foundElement = body.findText(regex1);
  Logger.log('foundElement -> %s', foundElement);

  // get attributes of foundElement
  var atFE = foundElement.getElement().getAttributes();
  Logger.log("atFE -> %s", atFE);

  while (foundElement != null) {

    var foundText = foundElement.getElement();

    // get attributes of foundText
    // var atFT = foundText.getAttributes();
    // Logger.log("atFT -> %s", atFT);

    // capitalize the character after the period   
    var str1 = foundText.getText();
    var str2 = str1.replace(regex2, replacement);
    foundText.setText(str2); // running, but removed attribute

    // Find the next match
    foundElement = body.findText(regex1, foundElement);

  }

  // try to set attributes in variable "atBody" to body 
  var body = body.setAttributes(atBody);
  Logger.log("body -> %s", body);
  return body;

}

the document contained the following paragraph, with a word in boldface, a word in italics, a word in underline:

capitalize sentences. this is one example with ONE blank space after the period. here is another example with TWO blank spaces after the period. this is yet another example with MORE THAN THREE blank spaces. but the boldface italics underline [this word was underlined in my google doc, but the current text area does not have the underline font] formats were removed after the text was replaced. how do i keep these attributes ?

after running the script cap13a, the capitalization of the sentences was all correct as shown above, but all boldface, italics, underline attributes were removed; below is the transformed google doc:

Capitalize sentences. This is one example with ONE blank space after the period. Here is another example with TWO blank spaces after the period. This is yet another example with MORE THAN THREE blank spaces. But the boldface italics underline formats were removed after the text was replaced. How do i keep these attributes ?

to see that the code did capitalize sentences correctly regardless of the number of blank spaces, see the above text as "code" below :

capitalize sentences.  this is one example with ONE blank space after the period.  here is another example with TWO blank spaces after the period.          this is yet another example with MORE THAN THREE blank spaces.      but the **boldface** _italics_ underline [this word was underlined in my google doc, but the current text area does not have the underline font] formats were removed after the text was replaced.         how do i keep these attributes ?

i obtained the following log:

[15-10-24 13:51:35:055 EDT] regex1 -> (^|.)(s*)[a-z], regex2 -> /(^|.)(\s*)[a-z]/ [15-10-24 13:51:35:128 EDT] doc -> Document [15-10-24 13:51:35:129 EDT] body -> DocumentBodySection [15-10-24 13:51:35:130 EDT] atBody -> {FONT_SIZE=null, ITALIC=null, PAGE_WIDTH=612.0, LINK_URL=null, UNDERLINE=null, BACKGROUND_COLOR=null, MARGIN_BOTTOM=72.0, PAGE_HEIGHT=792.0, MARGIN_RIGHT=72.0, STRIKETHROUGH=null, MARGIN_LEFT=72.0, FOREGROUND_COLOR=null, BOLD=null, FONT_FAMILY=null, MARGIN_TOP=72.0} [15-10-24 13:51:35:131 EDT] idBody -> [0, 232, 240, 241, 248, 249, 258] [15-10-24 13:51:35:132 EDT] foundElement -> RangeElement [15-10-24 13:51:35:133 EDT] atFE -> {FONT_SIZE=null, ITALIC=null, STRIKETHROUGH=null, FOREGROUND_COLOR=null, BOLD=null, LINK_URL=null, UNDERLINE=null, FONT_FAMILY=null, BACKGROUND_COLOR=null} [15-10-24 13:51:35:635 EDT] body -> DocumentBodySection

it is noted that the attributes of the original document did not contain any boldface, italics, underline font, but only the margins !?? i don't know where such attributes were kept in the document, and how to access these attributes.

did i do something wrong in my code ? i appreciate if someone could point out my errors. thanks.

see also my post google doc script, capitalize sentences without removing other attributes for the context of the above question.