可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Is there a convenient way to use Scala's parser combinators to parse languages where indentation is significant? (e.g. Python)

回答1:

Let's assume we have a very simple language where this is a valid program

block
  inside
  the
  block

and we want to parse this into a List[String] with each line inside the block as one String.

We first define a method that takes a minimum indentation level and returns a parser for a line with that indentation level.

def line(minIndent:Int):Parser[String] = 
  repN(minIndent + 1,"\\s".r) ~ ".*".r ^^ {case s ~ r => s.mkString + r}

Then we define a block with a minimum indentation level by repeating the line parser with a suitable separator between lines.

def lines(minIndent:Int):Parser[List[String]] =
  rep1sep(line(minIndent), "[\n\r]|(\n\r)".r)

Now we can define a parser for our little language like this:

val block:Parser[List[String]] =
  (("\\s*".r <~ "block\\n".r) ^^ { _.size }) >> lines

It first determines the current indentation level and then passes that as the minimum to the lines parser. Let's test it:

val s =
"""block
    inside
    the
    block
outside
the
block"""

println(block(new CharSequenceReader(s)))

And we get

[4.10] parsed: List(    inside,     the,     block)

For all of this to compile, you need these imports

import scala.util.parsing.combinator.RegexParsers
import scala.util.parsing.input.CharSequenceReader

And you need to put everything into an object that extends RegexParsers like so

object MyParsers extends RegexParsers {
  override def skipWhitespace = false
  ....

回答2:

From what I know, no, the Scala parser combinators don't have support for this kind of thing out of the box. You can certainly do it by parsing white space in a meaningful way, but you will encounter some problems since you need some form of state machine to keep track of the indentation stack.

I would recommend doing a preprocessing step. Here is a little preprocessor which adds markers to separate indented blocks:

object Preprocessor {

    val BlockStartToken = "{"
    val BlockEndToken = "}"

    val TabSize = 4 //how many spaces does a tab take

    def preProcess(text: String): String = {
        val lines = text.split('\n').toList.filterNot(_.forall(isWhiteChar))
        val processedLines = BlockStartToken :: insertTokens(lines, List(0))
        processedLines.mkString("\n")
    }

    def insertTokens(lines: List[String], stack: List[Int]): List[String] = lines match {
        case List() => List.fill(stack.length) { BlockEndToken } //closing all opened blocks
        case line :: rest => {
            (computeIndentation(line), stack) match {
                case (indentation, top :: stackRest) if indentation > top => {
                    BlockStartToken :: line :: insertTokens(rest,  indentation :: stack)
                }
                case (indentation, top :: stackRest) if indentation == top =>
                    line :: insertTokens(rest, stack)
                case (indentation, top :: stackRest) if indentation < top => {
                    BlockEndToken :: insertTokens(lines, stackRest)
                }
                case _ => throw new IllegalStateException("Invalid algorithm")
            }
        }
    }


    private def computeIndentation(line: String): Int = {
        val whiteSpace = line takeWhile isWhiteChar
        (whiteSpace map {
            case ' ' => 1
            case '\t' => TabSize
        }).sum
    }

    private def isWhiteChar(ch: Char) = ch == ' ' || ch == '\t'
}

Execution for this text gives:

val text =
    """
      |line1
      |line2
      |    line3
      |    line4
      |    line5
      |        line6
      |        line7
      |  line8
      |  line9
      |line10
      |   line11
      |   line12
      |   line13
    """.stripMargin
println(Preprocessor.preProcess(text))

... the following result

{
line1
line2
{
    line3
    line4
    line5
{
        line6
        line7
}
}
{
  line8
  line9
}
line10
{
   line11
   line12
   line13
}
}

And afterwords you can use the combinator library to do the parsing in a simpler fashion.

Hope this helps

Parsing an indentation based language using scala

问题:

回答1:

回答2:

收藏的人(0)

Parsing an indentation based language using scala

问题:

回答1:

回答2:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮