I am trying to parse a partially standardized street address into it's components using pyparsing
. I want to non-greedy match a street name that may be N tokens long.
For example:
444 PARK GARDEN LN
Should be parsed into:
number: 444
street: PARK GARDEN
suffix: LN
How would I do this with PyParsing? Here's my initial code:
from pyparsing import *
def main():
street_number = Word(nums).setResultsName('street_number')
street_suffix = oneOf("ST RD DR LN AVE WAY").setResultsName('street_suffix')
street_name = OneOrMore(Word(alphas)).setResultsName('street_name')
address = street_number + street_name + street_suffix
result = address.parseString("444 PARK GARDEN LN")
print result.dump()
if __name__ == '__main__':
main()
but when I try parsing it, the street suffix gets gobbled up by the default greedy parsing behavior.
Use the negation,
~
, to check to see if the upcomingstreet_name
is actually astreet_suffix
.In addition, you don't have to use
setResultsName
, you can simply use the syntax above. IMHO it leads to a much cleaner grammar definition.