Pyparsing OR operation use shortest string when mo

2019-07-30 22:29发布

I need to parse some statements but want the flexibility of using multiple words to signal the of the statement.

eg.

string = """
start some statement end
other stuff in between
start some other statement.
other stuff in between
start another statement
"""

in this case end, . and end of line are the tokens that will signal the end of the statement I am looking for.

I tried the following:

from pyparsing import restOfLine, SkipTo

skip_to_end_of_line = restOfLine
skip_to_dot = SkipTo('.', include=False)
skip_to_end = SkipTo('end', include=False)

statement = 'start' + skip_to_end_of_line^skip_to_dot^skip_to_end

statement.searchString(string)

([(['start some statement end\nother stuff in between\nstart some other statement'], {}), (['start', ' another statement'], {})], {})

By using the OR function it returns the largest string if there are more than two matches, I would like OR to return the shortest string resulting in

([(['start', ' some statement end'], {}), (['start', ' some other statement.'], {}), (['start', ' another statement'], {})], {})

1条回答
萌系小妹纸
2楼-- · 2019-07-30 22:34

SkipTo is one of the less predictable features of pyparsing, as it is easy for input data to be such that it results in more or less skipping than desired.

Try this instead:

term = LineEnd().suppress() | '.' | 'end'
statement = 'start' + OneOrMore(~term + Word(alphas)) + term

Instead of skipping blindly, this expression iteratively finds words, and stops when it finds one of your terminating conditions.

If you want the actual body string instead of the collection of words, you can use originalTextFor:

statement = 'start' + originalTextFor(OneOrMore(~term + Word(alphas))) + term
查看更多
登录 后发表回答