I am trying to parse a log file that contains multiple entries with the following format:
ITEM_BEGIN item_name
some_text
some_text
may optionally contain an expression matched by my_expr
anywhere within itself. I am only interested in item_name
and my_expr
(or None
if it is missing). Ideally, what I want is a list of (item_name, my_expr)
pairs. What is the best way to extract this information using pyparsing?
If you are not trying to define a parser for the entire input text, but only some pieces of it, look into using pyparsing's searchString
or scanString
methods - something along these lines:
import pyparsing as pp
ident = Word(alphas, alphanums+'_')
item_header = pp.Keyword("ITEM_BEGIN") + ident("name")
other_expr = ... whatever ...
search_expr = item_header | other_expr
found = {}
current_name = ''
for result in search_expr.searchString(input_text):
result = result[0]
if result[0] == "ITEM_BEGIN":
print("found an item header with name {name}".format_map(result))
current_name = result.name
found[result.name] = []
else:
# found an other expr
found[current_name].append(result.asList())