I need to parse a file with information separated by curly brackets, for example:
Continent
{
Name Europe
Country
{
Name UK
Dog
{
Name Fiffi
Colour Gray
}
Dog
{
Name Smut
Colour Black
}
}
}
Here is what I have tried in Python
from io import open
from pyparsing import *
import pprint
def parse(s):
return nestedExpr('{','}').parseString(s).asList()
def test(strng):
print strng
try:
cfgFile = file(strng)
cfgData = "".join( cfgFile.readlines() )
list = parse( cfgData )
pp = pprint.PrettyPrinter(2)
pp.pprint(list)
except ParseException, err:
print err.line
print " "*(err.column-1) + "^"
print err
cfgFile.close()
print
return list
if __name__ == '__main__':
test('testfile')
But this fails with an error:
testfile
Continent
^
Expected "{" (at char 0), (line:1, col:1)
Traceback (most recent call last):
File "xxx.py", line 55, in <module>
test('testfile')
File "xxx.py", line 40, in test
return list
UnboundLocalError: local variable 'list' referenced before assignment
What do I need to do to make this work? Is another parser than pyparsing better?
Recursivity is the key here. Try something around that:
The use case:
... which produce (Python 2.6):
Please take this as only starting point, and feel free to improve the code as you need (depending on your data, a dictionary could have been a better choice, maybe). In addition, the sample code does not handle properly ill formed data (notably extra or missing
}
-- I urge you to do a full test coverage ;)EDIT: Discovering
pyparsing
, I tried the following which appears to work (much) better and could be (more) easily tailored for special needs:Producing:
Nested expressions are so common, and usually require recursive parser definitions or recursive code if you're not using a parsing library. This code can be daunting for beginners, and error prone even for experts, so that is why I added the
nestedExpr
helper to pyparsing.The problem you are having is that your input string has more than just a nested braces expression in it. When I am first trying out a parser, I try to keep the testing as simple as possible - i.e., I inline the sample instead of reading it from a file, for instance.
And I get the same parsing error that you do:
So looking at the error message (and even at your own debugging code), pyparsing is stumbling on the leading word "Continent", because this word is not the beginning of a nested expression in braces, pyparsing (as we see in the exception message) was looking for an opening '{'.
The solution is to slightly modify your parser to handle the introductory "Continent" label, by changing expr to:
Now, printing out the results as a list (using pprint as done in the OP, nice work) looks like:
which should match up with your brace nesting.