I need to compare names which can be written in several ways. For example, a name like St. Thomas is sometimes written like St-Thomas or Sant Thomas. Preferably, I'm looking to build a function that gives a percentage of 'equalness' to a comparison, like some forums do (this post is 5% edited for example).
相关问题
- Views base64 encoded blob in HTML with PHP
- Laravel Option Select - Default Issue
- PHP Recursively File Folder Scan Sorted by Modific
- Can php detect if javascript is on or not?
- Using similar_text and strpos together
You can use different approaches.
You can use the
similar_text()
function to check for similarity.OR
You can use
levenshtein()
function to find out...And then check for a reasonable threshold for your check.
Check out
levenshtein()
, which does what you want and is comparatively efficient (but not extremely efficient): http://www.php.net/manual/en/function.levenshtein.phpPHP has two (main) built-in functions for this.
levenshtein
which counts how many changes (remove/add/replacements) are needed to produce string2 from string1. (lower is better)and
similar_text
which returns the number of matching characters (higher is better). Note that you can pass a reference as the third parameter and it'll give you a percentage.The edit distance between two strings of characters generally refers to the Levenshtein distance.
http://php.net/manual/en/function.levenshtein.php
So, either
levenshtein($v1, $v2)
orsimilar_text($v1, $v2, $percent)
will do it for you but still there is tradeoff. The complexity of thelevenshtein()
algorithm isO(m*n)
, where n and m are the length of v1 and v2 (rather good when compared tosimilar_text()
, which isO(max(n,m)**3)
, but still expensive).