“Deparsing” a list using pyparsing

2019-04-12 15:45发布

Is it possible to give pyparsing a parsed list and have it return the original string?

1条回答
手持菜刀,她持情操
2楼-- · 2019-04-12 16:32

Yes, you can if you've instructed the parser not to throw away any input. You do it with the Combine combinator.

Let's say your input is:

>>> s = 'abc,def,  ghi'

Here's a parser that grabs the exact text of the list:

>>> from pyparsing import *
>>> myList = Word(alphas) + ZeroOrMore(',' + Optional(White()) + Word(alphas))
>>> myList.leaveWhitespace()
>>> myList.parseString(s)
(['abc', ',', 'def', ',', '  ', 'ghi'], {})

To "deparse":

>>> reconstitutedList = Combine(myList)
>>> reconstitutedList.parseString(s)
(['abc,def,  ghi'], {})

which gives you the initial input back.

But this comes at a cost: having all that extra whitespace floating around as tokens is usually not convenient, and you'll note that we had to explicitly turn whitespace skipping off in myList. Here's a version that strips whitespace:

>>> myList = Word(alphas) + ZeroOrMore(',' + Word(alphas))
>>> myList.parseString(s)
(['abc', ',', 'def', ',', 'ghi'], {})
>>> reconstitutedList = Combine(myList, adjacent=False)
>>> reconstitutedList.parseString(s)
(['abc,def,ghi'], {})

Note you're not getting the literal input back at this point, but this may be good enough for you. Also note we had to explicitly tell Combine to allow the skipping of whitespace.

Really, though, in many cases you don't even care about the delimiters; you want the parser to focus on the items themselves. There's a function called commaSeparatedList that conveniently strips both delimiters and whitespace for you:

>>> myList = commaSeparatedList
>>> myList.parseString(s)
(['abc', 'def', 'ghi'], {})

In this case, though, the "deparsing" step doesn't have enough information for the reconstituted string to make sense:

>>> reconstitutedList = Combine(myList, adjacent=False)
>>> reconstitutedList.parseString(s)
(['abcdefghi'], {})
查看更多
登录 后发表回答