PHP Compare whether strings are (almost) equal

2020-07-23 03:22发布

I need to compare names which can be written in several ways. For example, a name like St. Thomas is sometimes written like St-Thomas or Sant Thomas. Preferably, I'm looking to build a function that gives a percentage of 'equalness' to a comparison, like some forums do (this post is 5% edited for example).

标签: php
5条回答
SAY GOODBYE
2楼-- · 2020-07-23 03:49

You can use different approaches.

You can use the similar_text() function to check for similarity.

OR

You can use levenshtein() function to find out...

The Levenshtein distance is defined as the minimal number of characters you have to replace, insert or delete to transform str1 into str2

And then check for a reasonable threshold for your check.

查看更多
欢心
3楼-- · 2020-07-23 03:52

Check out levenshtein(), which does what you want and is comparatively efficient (but not extremely efficient): http://www.php.net/manual/en/function.levenshtein.php

查看更多
叛逆
4楼-- · 2020-07-23 03:57

PHP has two (main) built-in functions for this.

levenshtein which counts how many changes (remove/add/replacements) are needed to produce string2 from string1. (lower is better)

and

similar_text which returns the number of matching characters (higher is better). Note that you can pass a reference as the third parameter and it'll give you a percentage.

<?php
    $originalPost = "Here's my question to stack overflou. Thanks /h2ooooooo";
    $editedPost = "Question to stack overflow.";
    $matchingCharacters = similar_text($originalPost, $editedPost, $matchingPercentage);
    var_dump($matchingCharacters); //int(25) 
    var_dump($matchingPercentage); //float(60.975609756098) (hence edited 40%)
?>
查看更多
地球回转人心会变
5楼-- · 2020-07-23 04:09

The edit distance between two strings of characters generally refers to the Levenshtein distance.

http://php.net/manual/en/function.levenshtein.php

查看更多
手持菜刀,她持情操
6楼-- · 2020-07-23 04:10
$v1 = 'pupil';
$v2 = 'people';
# TRUE if $v1 & $v2 have similar  pronunciation
soundex($v1) == soundex($v2);  
# Same but it use a more accurate comparison algorithm                 
metaphone($v1) == metaphone($v2);               
# Calculate how many common characters between 2 strings
# Percent store the percentage of common chars
$common = similar_text($v1, $v2, $percent);     
# Compute the difference of 2 text                                                 
$diff = levenshtein($v1, $v2); 

So, either levenshtein($v1, $v2) or similar_text($v1, $v2, $percent) will do it for you but still there is tradeoff. The complexity of the levenshtein() algorithm is O(m*n), where n and m are the length of v1 and v2 (rather good when compared to similar_text(), which is O(max(n,m)**3), but still expensive).

查看更多
登录 后发表回答