Parsing TCL lists in Python

I need to split space-delimited TCL lists on double braces... for instance...

OUTPUT = """{{172.25.50.10:01:01-Ethernet 172.25.50.10:01:02-Ethernet {Traffic Item 1}}} {{172.25.50.10:01:02-Ethernet 172.25.50.10:01:01-Ethernet {Traffic Item 1}}}"""

This should parse into...

OUTPUT = ["""{{172.25.50.10:01:01-Ethernet 172.25.50.10:01:02-Ethernet {Traffic Item 1}}}""", 
    """{{172.25.50.10:01:02-Ethernet 172.25.50.10:01:01-Ethernet {Traffic Item 1}}}"""]

I have tried...

import re
splitter = re.compile('}}\s+{{')
splitter.split(OUTPUT)

However, that trims the braces in the center...

['{{172.25.50.10:01:01-Ethernet 172.25.50.10:01:02-Ethernet {Traffic Item 1}',
'172.25.50.10:01:02-Ethernet 172.25.50.10:01:01-Ethernet {Traffic Item 1}}}']

I can't figure out how to only split on the spaces between }} {{. I know I can cheat and insert missing braces manually, but I would rather find a simple way to parse this out efficiently.

Is there a way to parse OUTPUT with re.split (or some other python parsing framework) for an arbitrary number of space-delimited rows containing {{content here}}?

标签： python regex parsing tcl

3条回答

Bombasti

2楼-- · 2019-04-09 17:08

You could modify your regex to use positive lookahead/behind assertions, which don't consume any of the string:

re.compile('(?<=}})\s+(?={{)')

0人赞添加讨论(0) 举报

Animai°情兽

3楼-- · 2019-04-09 17:09

You can use a regular expression to extract, instead of split off, the list item values…

re.findall(r'({{.*?}})(?:\Z|\s+)', OUTPUT)

For example:

In [30]: regex = re.compile(r'({{.*?}})(?:\Z|\s+)')

In [31]: regex.findall(OUTPUT)
Out[31]: 
['{{172.25.50.10:01:01-Ethernet 172.25.50.10:01:02-Ethernet {Traffic Item 1}}}',
 '{{172.25.50.10:01:02-Ethernet 172.25.50.10:01:01-Ethernet {Traffic Item 1}}}']

0人赞添加讨论(0) 举报

Animai°情兽

4楼-- · 2019-04-09 17:13

Pyparsing has improved since that comp.lang.python discussion, and I think even Cameron Laird would not complain about a solution using pyparsing's nestedExpr method:

OUTPUT = """{{172.25.50.10:01:01-Ethernet 172.25.50.10:01:02-Ethernet {Traffic Item 1}}} {{172.25.50.10:01:02-Ethernet 172.25.50.10:01:01-Ethernet {Traffic "Item 1"}}}"""

from pyparsing import nestedExpr, originalTextFor

nestedBraces1 = nestedExpr('{', '}')
for nb in nestedBraces1.searchString(OUTPUT):
    print nb

nestedBraces2 = originalTextFor(nestedExpr('{', '}'))
for nb in nestedBraces2.searchString(OUTPUT):
    print nb

Prints:

[[['172.25.50.10:01:01-Ethernet', '172.25.50.10:01:02-Ethernet', ['Traffic', 'Item', '1']]]]
[[['172.25.50.10:01:02-Ethernet', '172.25.50.10:01:01-Ethernet', ['Traffic', '"Item 1"']]]]
['{{172.25.50.10:01:01-Ethernet 172.25.50.10:01:02-Ethernet {Traffic Item 1}}}']
['{{172.25.50.10:01:02-Ethernet 172.25.50.10:01:01-Ethernet {Traffic "Item 1"}}}']

If you are going to have to resplit the data to get the individual items from the nested braces, then the original nested list output from nestedExpr might be of better help (note that even if a quoted string is in the list, the quoted item is kept as its own item). But if you really, really want that string containing the nested braces, then use the form with originalTextFor shown in nestedBraces2.

0人赞添加讨论(0) 举报

Parsing TCL lists in Python

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间