Regex for quoted string with escaping quotes

2019-01-01 04:00发布

How do I get the substring " It's big \"problem " using a regular expression?

s = ' function(){  return " It\'s big \"problem  ";  }';     

15条回答
流年柔荑漫光年
2楼-- · 2019-01-01 04:03

This one comes from nanorc.sample available in many linux distros. It is used for syntax highlighting of C style strings

\"(\\.|[^\"])*\"
查看更多
零度萤火
3楼-- · 2019-01-01 04:03

One has to remember that regexps aren't a silver bullet for everything string-y. Some stuff are simpler to do with a cursor and linear, manual, seeking. A CFL would do the trick pretty trivially, but there aren't many CFL implementations (afaik).

查看更多
无色无味的生活
4楼-- · 2019-01-01 04:03

I faced a similar problem trying to remove quoted strings that may interfere with parsing of some files.

I ended up with a two-step solution that beats any convoluted regex you can come up with:

 line = line.replace("\\\"","\'"); // Replace escaped quotes with something easier to handle
 line = line.replaceAll("\"([^\"]*)\"","\"x\""); // Simple is beautiful

Easier to read and probably more efficient.

查看更多
谁念西风独自凉
5楼-- · 2019-01-01 04:05

Most of the solutions provided here use alternative repetition paths i.e. (A|B)*.

You may encounter stack overflows on large inputs since some pattern compiler implements this using recursion.

Java for instance: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6337993

Something like this: "(?:[^"\\]*(?:\\.)?)*", or the one provided by Guy Bedford will reduce the amount of parsing steps avoiding most stack overflows.

查看更多
若你有天会懂
6楼-- · 2019-01-01 04:08

An option that has not been touched on before is:

  1. Reverse the string.
  2. Perform the matching on the reversed string.
  3. Re-reverse the matched strings.

This has the added bonus of being able to correctly match escaped open tags.

Lets say you had the following string; String \"this "should" NOT match\" and "this \"should\" match" Here, \"this "should" NOT match\" should not be matched and "should" should be. On top of that this \"should\" match should be matched and \"should\" should not.

First an example.

// The input string.
const myString = 'String \\"this "should" NOT match\\" and "this \\"should\\" match"';

// The RegExp.
const regExp = new RegExp(
    // Match close
    '([\'"])(?!(?:[\\\\]{2})*[\\\\](?![\\\\]))' +
    '((?:' +
        // Match escaped close quote
        '(?:\\1(?=(?:[\\\\]{2})*[\\\\](?![\\\\])))|' +
        // Match everything thats not the close quote
        '(?:(?!\\1).)' +
    '){0,})' +
    // Match open
    '(\\1)(?!(?:[\\\\]{2})*[\\\\](?![\\\\]))',
    'g'
);

// Reverse the matched strings.
matches = myString
    // Reverse the string.
    .split('').reverse().join('')
    // '"hctam "\dluohs"\ siht" dna "\hctam TON "dluohs" siht"\ gnirtS'

    // Match the quoted
    .match(regExp)
    // ['"hctam "\dluohs"\ siht"', '"dluohs"']

    // Reverse the matches
    .map(x => x.split('').reverse().join(''))
    // ['"this \"should\" match"', '"should"']

    // Re order the matches
    .reverse();
    // ['"should"', '"this \"should\" match"']

Okay, now to explain the RegExp. This is the regexp can be easily broken into three pieces. As follows:

# Part 1
(['"])         # Match a closing quotation mark " or '
(?!            # As long as it's not followed by
  (?:[\\]{2})* # A pair of escape characters
  [\\]         # and a single escape
  (?![\\])     # As long as that's not followed by an escape
)
# Part 2
((?:          # Match inside the quotes
(?:           # Match option 1:
  \1          # Match the closing quote
  (?=         # As long as it's followed by
    (?:\\\\)* # A pair of escape characters
    \\        # 
    (?![\\])  # As long as that's not followed by an escape
  )           # and a single escape
)|            # OR
(?:           # Match option 2:
  (?!\1).     # Any character that isn't the closing quote
)
)*)           # Match the group 0 or more times
# Part 3
(\1)           # Match an open quotation mark that is the same as the closing one
(?!            # As long as it's not followed by
  (?:[\\]{2})* # A pair of escape characters
  [\\]         # and a single escape
  (?![\\])     # As long as that's not followed by an escape
)

This is probably a lot clearer in image form: generated using Jex's Regulex

Image on github (JavaScript Regular Expression Visualizer.) Sorry, I don't have a high enough reputation to include images, so, it's just a link for now.

Here is a gist of an example function using this concept that's a little more advanced: https://gist.github.com/scagood/bd99371c072d49a4fee29d193252f5fc#file-matchquotes-js

查看更多
查无此人
7楼-- · 2019-01-01 04:13
/(["\']).*?(?<!\\)(\\\\)*\1/is

should work with any quoted string

查看更多
登录 后发表回答