Regex - Matching Abbreviations of a Word

2019-08-14 21:05发布

问题:

I was thinking in providing the following regex as an answer to this question, but I can't seem to write the regular expression I was looking for:

w?o?r?d?p?r?e?s?s?

This should match a ordered abbreviation of the word wordpress, but it can also match nothing at all.

How can I modify the above regex in order for it to match at least 4 chars in order? Like:

  • word
  • wrdp
  • press
  • wordp
  • wpress
  • wordpress

I'd like to know what is the best way to do this... =)

回答1:

You could use a lookahead assertion:

^(?=.{4})w?o?r?d?p?r?e?s?s?$


回答2:

What about php similarity checker functions?

  • levenshtein
  • similar_text


回答3:

if ( strlen($string) >= 4 && preg_match('#^w?o?r?d?p?r?e?s?s?$#', $string) ) {
    // abbreviation ok
}

This won't even run the regexp unless the string is at least 4 chars long.



回答4:

i know this is not a regex, just for fun...

#!/usr/bin/python

FULLWORD = "wordprocess"

def check_word(word):
    i, j = 0, 0
    while i < len(word) and j < len(FULLWORD):
        if word[i] == FULLWORD[j]:
            i += 1; j += 1
        else:
            j += 1

    if j >= len(FULLWORD) or i < 4 or i >= len(FULLWORD):
        return "%s: FAIL" % word
    return "%s: SUCC" % word

print check_word("wd")
print check_word("wdps")
print check_word("wsdp")
print check_word("wordprocessr")