Creating fuzzy matching exceptions with Python'

I'm testing the new python regex module, which allows for fuzzy string matching, and have been impressed with its capabilities so far. However, I've been having trouble making certain exceptions with fuzzy matching. The following is a case in point. I want ST LOUIS, and all variations of ST LOUIS within an edit distance of 1 to match ref. However, I want to make one exception to this rule: the edit cannot consist of an insertion to the left of the leftmost character containing the letters N, S, E, or W. With the following example, I want inputs 1 - 3 to match ref, and input 4 to fail. However, using the following ref causes it to match to all four inputs. Does anyone who is familiar with the new regex module know of a possible workaround?

input1 = 'ST LOUIS'
input2 = 'AST LOUIS'
input3 = 'ST LOUS'
input4 = 'NST LOUIS'


ref = '([^NSEW]|(?<=^))(ST LOUIS){e<=1}'

match = regex.fullmatch(ref,input1)
match
<_regex.Match object at 0x1006c6030>
match = regex.fullmatch(ref,input2)
match
<_regex.Match object at 0x1006c6120>
match = regex.fullmatch(ref,input3)
match
<_regex.Match object at 0x1006c6030>
match = regex.fullmatch(ref,input4)
match
<_regex.Match object at 0x1006c6120>

标签： python regex pypi-regex

1条回答

ら.Afraid

2楼-- · 2020-03-24 04:51

Try a negative lookahead instead:

(?![NEW]|SS)(ST LOUIS){e<=1}

(ST LOUIS){e<=1} matches a string meeting the fuzzy conditions placed on it. You want to prevent it from starting with [NSEW]. A negative lookahead does that for you (?![NSEW]). But your desired string starts with an S already, you only want to exclude the strings starting with an S added to the beginning of your string. Such a string would start with SS, and that's why it's added to the negative lookahead.

Note that if you allow errors > 1, this probably wouldn't work as desired.

0人赞添加讨论(0) 举报

Creating fuzzy matching exceptions with Python'

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间