optimizing regex to fine key=value pairs, space de

shortend URL with my current regex in regexpal: http://bit.ly/1jbOFGd

I have a line of key=value pairs, space delimited. Some values contain spaces and punctuation so I do a positive lookahead to check for the existence of another key.

I want to tokenize the key and value, which I later convert to a dict in python.

My guess is that I can speed this up by getting rid of .*? but how? In python I convert 10,000 of these lines in 4.3 seconds. I'd like to double or triple that speed by making this regex match more efficient.

标签： regex python-2.7 optimization key

2条回答

看我几分像从前

2楼-- · 2019-08-04 11:56

Update:

(?<=\s|\A)([^\s=]+)=(.*?)(?=(?:\s[^\s=]+=|$))

I would think this one is more efficient than yours (even though it still uses the .*? for the value, its lookahead is no where near as complex and doesn't use a lazy modifier), but I'll need you to test. This does the same as my original expression, but handles values differently. It uses a lazy .*? match followed by a lookahead that is either a space, followed by a key, followed by a = OR the end of the string. Notice I always define a key as [^\s=]+, so keys cannot contain an equal sign or whitespace (being this specific helps us avoid lazy matches).

Source

Original:

Are there some rules I am missing that you need by doing something this simple?

(?<=\s|\A)([^=]+)=([\S]+)

This starts with a lookbehind of either a space character (\s) or the beginning of the string (\A). Then we match everything except =, followed by a =, and match everything except whitespace (\s).

0人赞添加讨论(0) 举报

霸刀☆藐视天下

3楼-- · 2019-08-04 12:09

"Lookbehind" (related to 'lookahead' and 'lookaround') is the key 'regular expression' concept to read up on here - it let's you match and skip individual components of the string.

Good examples here: http://www.rexegg.com/regex-lookarounds.html.

0人赞添加讨论(0) 举报

optimizing regex to fine key=value pairs, space de

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间