I'm trying to build a grammar to parse an Erlang tagged tuple list, and map this to a Dict in pyparsing. I'm having problems when I have a list of Dicts. The grammar works if the Dict has just one element, but when I add a second can't work out now to get it to parse.
Current (simplified grammar code (I removed the bits of the language not necessary in this case):
#!/usr/bin/env python2.7
from pyparsing import *
# Erlang config file definition:
erlangAtom = Word( alphas + '_')
erlangString = dblQuotedString.setParseAction( removeQuotes )
erlangValue = Forward()
erlangList = Forward()
erlangElements = delimitedList( erlangValue )
erlangCSList = Suppress('[') + erlangElements + Suppress(']')
erlangList <<= Group( erlangCSList )
erlangTaggedTuple = Group( Suppress('{') + erlangAtom + Suppress(',') +
erlangValue + Suppress('}') )
erlangDict = Dict( Suppress('[') + delimitedList( erlangTaggedTuple ) +
Suppress(']') )
erlangValue <<= ( erlangAtom | erlangString |
erlangTaggedTuple |
erlangDict | erlangList )
if __name__ == "__main__":
working = """
[{foo,"bar"}, {baz, "bar2"}]
"""
broken = """
[
[{foo,"bar"}, {baz, "bar2"}],
[{foo,"bob"}, {baz, "fez"}]
]
"""
w = erlangValue.parseString(working)
print w.dump()
b = erlangValue.parseString(broken)
print "b[0]:", b[0].dump()
print "b[1]:", b[1].dump()
This gives:
[['foo', 'bar'], ['baz', 'bar2']]
- baz: bar2
- foo: bar
b[0]: [['foo', 'bar'], ['baz', 'bar2'], ['foo', 'bob'], ['baz', 'fez']]
- baz: fez
- foo: bob
b[1]:
Traceback (most recent call last):
File "./erl_testcase.py", line 39, in <module>
print "b[1]:", b[1].dump()
File "/Library/Python/2.7/site-packages/pyparsing.py", line 317, in __getitem__
return self.__toklist[i]
IndexError: list index out of range
i.e. working
works, but broken
doesn't parse as two lists.
Any ideas?
Edit: Tweaked testcase to be more explicit about expected output.
Ok, so I have never worked with pyparsing before, so excuse me if my solution does not make sense. Here we go:
As far as I understand what you need is three main structures. The most common mistake you made was grouping delimitedLists. They are already grouped, so you have an issue of double grouping. Here are my definitions:
for {a,"b"}:
erlangTaggedTuple = Dict(Group(Suppress('{') + erlangAtom + Suppress(',') + erlangValue + Suppress('}') ))
for [{a,"b"}, {c,"d"}]:
erlangDict = Suppress('[') + delimitedList( erlangTaggedTuple ) + Suppress(']')
for the rest:
erlangList <<= Suppress('[') + delimitedList( Group(erlangDict|erlangList) ) + Suppress(']')
So my fix for your code is:
#!/usr/bin/env python2.7
from pyparsing import *
# Erlang config file definition:
erlangAtom = Word( alphas + '_')
erlangString = dblQuotedString.setParseAction( removeQuotes )
erlangValue = Forward()
erlangList = Forward()
erlangTaggedTuple = Dict(Group(Suppress('{') + erlangAtom + Suppress(',') +
erlangValue + Suppress('}') ))
erlangDict = Suppress('[') + delimitedList( erlangTaggedTuple ) + Suppress(']')
erlangList <<= Suppress('[') + delimitedList( Group(erlangDict|erlangList) ) + Suppress(']')
erlangValue <<= ( erlangAtom | erlangString |
erlangTaggedTuple |
erlangDict| erlangList )
if __name__ == "__main__":
working = """
[{foo,"bar"}, {baz, "bar2"}]
"""
broken = """
[
[{foo,"bar"}, {baz, "bar2"}],
[{foo,"bob"}, {baz, "fez"}]
]
"""
w = erlangValue.parseString(working)
print w.dump()
b = erlangValue.parseString(broken)
print "b[0]:", b[0].dump()
print "b[1]:", b[1].dump()
Which gives the output:
[['foo', 'bar'], ['baz', 'bar2']]
- baz: bar2
- foo: bar
b[0]: [['foo', 'bar'], ['baz', 'bar2']]
- baz: bar2
- foo: bar
b[1]: [['foo', 'bob'], ['baz', 'fez']]
- baz: fez
- foo: bob
Hope that helps, cheers!
I can't understand why it's not working, because your code looks very much like the JSON example, which handles nested lists just fine.
But the problem seems to happen at this line
erlangElements = delimitedList( erlangValue )
where if the erlangValue
s are lists, they get appended instead of cons'd. You can kludge around this with
erlangElements = delimitedList( Group(erlangValue) )
which adds an extra layer of list around the top-most element, but keeps your sub-lists from merging.