Learning Treetop

2020-05-16 00:38发布

问题:

I'm trying to teach myself Ruby's Treetop grammar generator. I am finding that not only is the documentation woefully sparse for the "best" one out there, but that it doesn't seem to work as intuitively as I'd hoped.

On a high level, I'd really love a better tutorial than the on-site docs or the video, if there is one.

On a lower level, here's a grammar I cannot get to work at all:

grammar SimpleTest

  rule num
    (float / integer)
  end

  rule float
   (
    (( '+' / '-')? plain_digits '.' plain_digits) /
    (( '+' / '-')? plain_digits ('E' / 'e') plain_digits ) /
    (( '+' / '-')? plain_digits '.') / 
    (( '+' / '-')? '.' plain_digits) 
   ) {
      def eval
        text_value.to_f
      end
   }
  end

  rule integer
    (( '+' / '-' )? plain_digits) {
      def eval
        text_value.to_i
      end
    }
  end

  rule plain_digits
    [0-9] [0-9]*      
  end

end

When I load it and run some assertions in a very simple test object, I find:

assert_equal @parser.parse('3.14').eval,3.14

Works fine, while

assert_equal @parser.parse('3').eval,3

raises the error: NoMethodError: private method `eval' called for #

If I reverse integer and float on the description, both integers and floats give me this error. I think this may be related to limited lookahead, but I cannot find any information in any of the docs to even cover the idea of evaluating in the "or" context

A bit more info that may help. Here's pp information for both those parse() blocks.

The float:

SyntaxNode+Float4+Float0 offset=0, "3.14" (eval,plain_digits):
  SyntaxNode offset=0, ""
  SyntaxNode+PlainDigits0 offset=0, "3":
    SyntaxNode offset=0, "3"
    SyntaxNode offset=1, ""
  SyntaxNode offset=1, "."
  SyntaxNode+PlainDigits0 offset=2, "14":
    SyntaxNode offset=2, "1"
    SyntaxNode offset=3, "4":
      SyntaxNode offset=3, "4"

The Integer... note that it seems to have been defined to follow the integer rule, but not caught the eval() method:

SyntaxNode+Integer0 offset=0, "3" (plain_digits):
  SyntaxNode offset=0, ""
  SyntaxNode+PlainDigits0 offset=0, "3":
    SyntaxNode offset=0, "3"
    SyntaxNode offset=1, ""

Update:

I got my particular problem working, but I have no clue why:

  rule integer
    ( '+' / '-' )? plain_digits
     {
      def eval
        text_value.to_i
      end
    }
  end

This makes no sense with the docs that are present, but just removing the extra parentheses made the match include the Integer1 class as well as Integer0. Integer1 is apparently the class holding the eval() method. I have no idea why this is the case.

I'm still looking for more info about treetop.

回答1:

Sadly, Treetop's documentation sucks. A lot. And the examples on the website aren't helpful. I found that dzone has a pretty large collection of treetop grammars :

Treetop grammars



回答2:

You might appreciate Paul Battley's nice, simple tutorial at Getting started with Treetop

Starting with a minimal grammar, he shows how to create a parser and then through a couple of iterations adds just a bit of functionality. It was just enough to get me out of the starting blocks.



回答3:

Roland Swingler gave a presentation on Treetop to LRUG: http://skillsmatter.com/podcast/ajax-ria/treetop that I found useful getting started.



回答4:

Citrus is a much lighter alternative to treetop: http://github.com/mjijackson/citrus



回答5:

I followed this Treetop Introductory Tutorial a couple of years ago to understand the basics of treetop.

And then A quick intro to writing a parser with Treetop, useful to me because it explained how to map syntax tree nodes to ruby class instances.



回答6:

I've just started experimenting with TreeTop.

I tried the changing

 rule num
      (float / integer)
 end

to

 rule num
      (float / integer)
      {
       def eval
            text_value.to_f
       end
      }
 end

And it seems to work.



回答7:

This is a bug. The unnecessary parentheses around the rule for an integer causes construction of an extra module to contain the definition of eval, and this module doesn't get mixed in to the node, so 'eval' is not available. You can see this clearly if you compare the Ruby code (generated using the tt command) for versions with and without these extra parentheses.



回答8:

The Treetop docs seem to assume you already know a fair amount about parsing expression grammars (PEGs). Treetop is based entirely on PEGs. PEGs are bigger than just Treetop though, i.e. they are used in other parsing libraries as well. In learning Treetop, I found it very helpful to study up on PEGs in general. That helped fill in a lot of the gaps in the documentation.