Eliminate newlines in google app script using rege

2019-01-15 16:20发布

问题:

I'm trying to write part of an add-on for Google Docs that eliminates newlines within selected text using replaceText. The obvious text.replaceText("\n",""); gives the error Invalid argument: searchPattern. I get the same error with text.replaceText("\r","");. The following attempts do nothing: text.replaceText("/\n/","");, text.replaceText("/\r/","");. I don't know why Google App Script does not allow for the recognition of newlines in regex.

I am aware that there is an add-on that does this already, but I want to incorporate this function into my add-on.

This error occurs even with the basic

DocumentApp.getActiveDocument().getBody().textReplace("\n","");

My full function:

function removeLineBreaks() {

var selection = DocumentApp.getActiveDocument().getSelection();
if (selection) {
    var elements = selection.getRangeElements();
    for (var i = 0; i < elements.length; i++) {
        var element = elements[i];

        // Only deal with text elements

        if (element.getElement().editAsText) {
            var text = element.getElement().editAsText();

            if (element.isPartial()) {
                text.replaceText("\n","");
            }

            // Deal with fully selected text
            else {
                text.replaceText("\n","");
            }
        }
    }
}

// No text selected
else {
    DocumentApp.getUi().alert('No text selected. Please select some text and try again.');
}

}

回答1:

It seems that in replaceText, to remove soft returns entered with Shift-ENTER, you can use \v:

.replaceText("\\v+", "")

If you want to remove all "other" control characters (C0, DEL and C1 control codes), you may use

.replaceText("\\p{Cc}+", "")

Note that the \v pattern is a construct supported by JavaScript regex engine, and is considered to matcg a vertical tab character (≡ \013) by the RE2 regex library used in most Google products.



回答2:

I have now found out through much trial and error -- and some much needed help from Wiktor Stribiżew (see other answer) -- that there is a solution to this, but it relies on the fact that Google Script does not recognise \n or \r in regex searches. The solution is as follows:

function removeLineBreaks() {
  var selection = DocumentApp.getActiveDocument()
    .getSelection();
  if (selection) {
    var elements = selection.getRangeElements();
    for (var i = 0; i < elements.length; i++) {
      var element = elements[i];
      // Only deal with text elements
      if (element.getElement()
        .editAsText) {
        var text = element.getElement()
          .editAsText();
        if (element.isPartial()) {
          var start = element.getStartOffset();
          var finish = element.getEndOffsetInclusive();
          var oldText = text.getText()
            .slice(start, finish);
          if (oldText.match(/\r/)) {
            var number = oldText.match(/\r/g)
              .length;
            for (var j = 0; j < number; j++) {
              var location = oldText.search(/\r/);
              text.deleteText(start + location, start + location);
              text.insertText(start + location, ' ');
              var oldText = oldText.replace(/\r/, ' ');
            }
          }
        }
        // Deal with fully selected text
        else {
          text.replaceText("\\v+", " ");
        }
      }
    }
  }
  // No text selected
  else {
    DocumentApp.getUi()
      .alert('No text selected. Please select some text and try again.');
  }
}

Explanation

Google Docs allows searching for vertical tabs (\v), which match newlines.

Partial text is a whole other problem. The solution to dealing with partially selected text above finds the location of newlines by extracting a text string from the text element and searching in that string. It then uses these locations to delete the relevant characters. This is repeated until the number of newlines in the selected text has been reached.



回答3:

The Google Apps Script function replaceText() still doesn't accept escape characters, but I was able to get around this by using getText(), then the generic JavaScript replace(), then setText():

var doc = DocumentApp.getActiveDocument();
var body = doc.getBody();

var bodyText = body.getText();

//DocumentApp.getUi().alert( "Does document contain \\t? " + /\t/.test( bodyText ) ); // \n true, \r false, \t true

bodyText = bodyText.replace( /\n/g, "" );
bodyText = bodyText.replace( /\t/g, "" );

body.setText( bodyText );

This worked within a Doc. Not sure if the same is possible within a Sheet (and, even if it were, you'd probably have to run this once cell at a time).