I've just started using pyparsing
this evening and I've built a complex grammar which describes some sources I'm working with very effectively. It was very easy and very powerful. However, I'm having some trouble working with ParsedResults
. I need to be able to iterate over nested tokens in the order that they're found, and I'm finding it a little frustrating. I've abstracted my problem to a simple case:
import pyparsing as pp
word = pp.Word(pp.alphas + ',.')('word*')
direct_speech = pp.Suppress('“') + pp.Group(pp.OneOrMore(word))('direct_speech*') + pp.Suppress('”')
sentence = pp.Group(pp.OneOrMore(word | direct_speech))('sentence')
test_string = 'Lorem ipsum “dolor sit” amet, consectetur.'
r = sentence.parseString(test_string)
print r.asXML('div')
print ''
for name, item in r.sentence.items():
print name, item
print ''
for item in r.sentence:
print item.getName(), item.asList()
as far as I can see, this ought to work? Here is the output:
<div>
<sentence>
<word>Lorem</word>
<word>ipsum</word>
<direct_speech>
<word>dolor</word>
<word>sit</word>
</direct_speech>
<word>amet,</word>
<word>consectetur.</word>
</sentence>
</div>
word ['Lorem', 'ipsum', 'amet,', 'consectetur.']
direct_speech [['dolor', 'sit']]
Traceback (most recent call last):
File "./test.py", line 27, in <module>
print item.getName(), item.asList()
AttributeError: 'str' object has no attribute 'getName'
The XML output seems to indicate that the string is parsed exactly as I would wish, but I can't iterate over the sentence, for example, to reconstruct it.
Is there a way to do what I need to?
Thanks!
edit:
I've been using this:
for item in r.sentence:
if isinstance(item, basestring):
print item
else:
print item.getName(), item
but it doesn't help me all that much, because I can't distinguish different types of string. Here is a slightly expanded example:
word = pp.Word(pp.alphas + ',.')('word*')
number = pp.Word(pp.nums + ',.')('number*')
direct_speech = pp.Suppress('“') + pp.Group(pp.OneOrMore(word | number))('direct_speech*') + pp.Suppress('”')
sentence = pp.Group(pp.OneOrMore(word | number | direct_speech))('sentence')
test_string = 'Lorem 14 ipsum “dolor 22 sit” amet, consectetur.'
r = sentence.parseString(test_string)
for i, item in enumerate(r.sentence):
if isinstance(item, basestring):
print i, item
else:
print i, item.getName(), item
the output is:
0 Lorem
1 14
2 ipsum
3 word ['dolor', '22', 'sit']
4 amet,
5 consectetur.
not too helpful. I can't distinguish between word
and number
, and the direct_speech
element is labelled word
?!
I'm obviously missing something. All I want to do is:
for item in r.sentence:
if (item is a number):
do something
elif (item is a word):
do something else
etc. ...
should I be approaching this differently?
well, I've tried a number of different approaches now and I can't get what I need, so (absurd though it seems), I'm using
.asXML()
and parsing the resulting XML. Here's my example:which outputs:
seems like a long way around the houses, but there doesn't seem to be a better way.
r.sentence
contains a mix of strings and ParseResults, and only ParseResults supportgetName()
. Have you tried just iterating overr.sentence
? If I print it out using asList(), I get:Or this snippet:
Gives:
I'm not sure I answered your question, but does that shed any light on where to go next?
(Welcome to Pyparsing)