Difficulty of this particular job using pyparsing?

2019-06-26 11:32发布

I have a task to do that I'm sure Python and pyparsing could really help with, but I'm still too much of a novice with programming to make a smart choice about how challenging the complete implementation will be and whether it's worth trying or is certain to be a fruitless time-sink.

The task is to translate strings of arbitrary length and nesting depth with a structure following the general grammar of this one:

item12345 'topic(subtopic(sub-subtopic), subtopic2), topic2'

into an item in a dictionary like this one:

{item12345, 'topic, topic:subtopic, topic:subtopic:sub-subtopic, topic:subtopic2, topic2'}

In other words, the logic is exactly like mathematics where the item immediately to the left of parentheses is distributed to everything inside, and the ',' designates the terms inside of the parentheses, much like how addition functions with respect to factors of a binomial.

I've either discovered for myself or found and understood examples of some of the seemingly necessary elements for creating this solution so far.

Parsing nested expressions in Python:

def parenthetic_contents(string):
"""Generate parenthesized contents in string as pairs (level, contents)."""
stack = []
for i, c in enumerate(string):
    if c == '(':
        stack.append(i)
    elif c == ')' and stack:
        start = stack.pop()
        yield (len(stack), string[start + 1: i])

Distributing one string to others:

from pyparsing import Suppress,Word,ZeroOrMore,alphas,nums,delimitedList

data = '''\
MSE 2110, 3030, 4102
CSE 1000, 2000, 3000
DDE 1400, 4030, 5000
'''

def memorize(t):
    memorize.dept = t[0]

def token(t):
    return "Course: %s %s" % (memorize.dept, int(t[0]))

course = Suppress(Word(alphas).setParseAction(memorize))
number = Word(nums).setParseAction(token)
line = course + delimitedList(number)
lines = ZeroOrMore(line)

final = lines.parseString(data)

for i in final:
    print i

And some others, but these methods won't directly apply to my ultimate solution, and I've still got a ways to go before I understand python and pyparsing well enough to combine the ideas or find new ones.

I've been hammering away at it by looking for examples, looking for stuff that works similarly, learning more python and more of pyparsing's classes and methods, but I'm not sure how far away I am from knowing enough to make something that works for my full solution rather than just intermediate exercises that won't work for the general case.

So my questions are these. How complex a solution will I ultimately need in order to do what I'm looking for? What suggestions do you have that might help me get closer?

Thanks in advance! (PS - first post on StackOverflow, let me know if I need to do anything differently with regard to this post)

1条回答
爷、活的狠高调
2楼-- · 2019-06-26 12:10

In pyparsing, your example would look something like:

from pyparsing import Word,alphanums,Forward,Optional,nestedExpr,delimitedList

topicString = Word(alphanums+'-')
expr = Forward()
expr << topicString + Optional(nestedExpr(content=delimitedList(expr)))

test = 'topic(subtopic(sub-subtopic), subtopic2), topic2'

print delimitedList(expr).parseString(test).asList()

Prints

['topic', ['subtopic', ['sub-subtopic'], 'subtopic2'], 'topic2']

Converting to topic:subtopic, etc. is left as an exercise for the OP.

查看更多
登录 后发表回答