可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I'm trying to parse a text file using parser combinators. I want to capture the index and text in a class called Example. Here's a test showing the form on an input file:

object Test extends ParsComb with App {
  val input = """
0)
blah1
blah2
blah3
1)
blah4
blah5
END
"""
  println(parseAll(examples, input))
}

And here's my attempt that doesn't work:

import scala.util.parsing.combinator.RegexParsers

case class Example(index: Int, text: String)

class ParsComb extends RegexParsers {
  def examples: Parser[List[Example]] = rep(divider~example) ^^ 
                                          {_ map {case d ~ e => Example(d,e)}}
  def divider:  Parser[Int]           = "[0-9]+".r <~ ")"    ^^ (_.toInt)
  def example:  Parser[String]        = ".*".r <~ (divider | "END") 
}

It fails with:

[4.1] failure: `END' expected but `b' found

blah2

^

I'm just starting out with these so I don't have much clue what I'm doing. I think the problem could be with the ".*".r regex not doing multi-line. How can I change this so that it parses correctly?

回答1:

What does the error message mean?

According to your grammar definition, ".*".r <~ (divider | "END"), you told to the parser that, an example should followed either by a divider or a END. After parsing blah1, the parser tried to find divider and failed, then tried END, failed again, there're no other options available, so the END here was the last alternative of the production value, so from the parser's perspective, it expected END, but it soon found, the next input was blah2 from the 4th line.

How to fix it?

Try to be close to your implementation, the grammar in your case should be:

examples ::= {divider example}
divider  ::= Integer")"
example  ::= {literal ["END"]}

and I think parsing "example" into List[String] makes more sense, anyway, it's up to you.

The problem is your example parser, it should be a repeatable literal.

So ,

class ParsComb extends RegexParsers {
  def examples: Parser[List[Example]] = rep(divider ~ example) ^^ { _ map { case d ~ e => Example(d, e) } }
  def divider: Parser[Int] = "[0-9]+".r <~ ")" ^^ (_.toInt)
  def example: Parser[List[String]] = rep("[\\w]*(?=[\\r\\n])".r <~ opt("END"))
}

the regex (?=[\\r\\n]) means it's a positive lookahead and would match characters that followed by \r or \n.

the parse result is:

[10.1] parsed: List(Example(0,List(blah1, blah2, blah3)), Example(1,List(blah4, blah5)))

If you want to parse it into a String(instead of List[String]), just add a transform function for example: ^^ {_ mkString "\n"}

回答2:

Your parser can't process newline character, your example parser eliminates next divider and your example regex matches divider and "END" string.

Try this:

object ParsComb extends RegexParsers { 
  def examples: Parser[List[Example]] = rep(divider~example) <~ """END\n?""".r ^^ {_ map {case d ~ e => Example(d,e)}} 
  def divider: Parser[Int] = "[0-9]+".r <~ ")\n" ^^ (_.toInt) 
  def example: Parser[String] = rep(str) ^^ {_.mkString}
  def str: Parser[String] = """.*\n""".r ^? { case s if simpleLine(s) => s}

  val div = """[0-9]+\)\n""".r
  def simpleLine(s: String) = s match {
    case div() => false
    case "END\n" => false
    case _ => true
  }

  def apply(s: String) = parseAll(examples, s)
}

Result:

scala> ParsComb(input)
res3: ParsComb.ParseResult[List[Example]] =
[10.1] parsed: List(Example(0,blah1
blah2
blah3
), Example(1,blah4
blah5
))

回答3:

I think the problem could be with the ".*".r regex not doing multi-line.

Exactly. Use the dotall modifier (strangely called "s"):

def example:  Parser[String]        = "(?s).*".r <~ (divider | "END")

Using parser combinators to collate lines of text

问题:

回答1:

回答2:

回答3:

收藏的人(0)

Using parser combinators to collate lines of text

问题:

回答1:

回答2:

回答3:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮