Parsing quotes in string in scala

2019-08-13 04:37发布

问题:

I am trying to parse the following string

val s1 = """ "foo","bar", "foo,bar" """

And out put of this parsing I am hoping is...

 List[String] ["foo","bar","foo,bar"] length 3

I am able to parse the following

val s2 = """ "foo","bar", 'foo,bar' """

By using the following pattern

val pattern = "(('[^']*')|([^,]+))".r

 pattern.findAllMatchIn(s2).map(_.toString).toList
 gives  ["foo","bar", 'foo,bar'] :length 3

EDIT Currently I am able to parse: "foo,bar,foo bar" => [foo,bar,foo bar"] "foo,bar, 'foo bar' " => [foo, bar , 'foo bar'] //len 3

I want to parse these lines as well..

But I am not able to figure out the pattern for s2.. Note that I need to parse both s1 and s2 successfully

Currently I am able to parse:

"foo,bar,foo bar" => [foo,bar,foo bar"]
    "foo,bar, 'foo bar' " => [foo, bar , 'foo bar'] //len 3

I want to parse these lines as well.. along with the following line:

 """ foo, bar, "foo,bar" """ // gives [foo,bar,"foo,bar"] len 3

回答1:

The following works for your s1 and s2 examples:

(["']).*?\1

["'] matches a double or single quote (which is captured as a group). We then match anything followed by a closing quote that matches the opening quote (using the capture group \1). We use a non-greedy match .*? so that we don't consume the closing quote.

Note that you'll need to use triple quoting, since the pattern has a quote in it:

val pattern =  """(["']).*?\1""".r

Update to handle further cases added to question:

To also handle your comma-separated examples, you need to match combinations of word characters \w or whitespace \s, terminated by either a comma or the end of the line, but excluding the terminating character using a lookahead (?=(,|$))

(["']).*?\1|\w(\w|\s)*(?=(,|$))


标签: regex scala