how to use similar text php code in arabic

2020-07-24 13:24发布

问题:

Trying to use php similar_text() with arabic, but it's not working. However it works great with english.

<?php 
$var = similar_text("ياسر","عمار","$per");
echo $var;
?>
outbot : 5 

that's wrong result, it should be 2. Is there similar_text() with arabic letters?

回答1:

Because the Arabic text are multibyte strings normal PHP functions cannot be used (such as 'similar_text()').

echo(strlen("عمار"));

The above code outputs: 8

echo(mb_strlen("عمار", "UTF-8"));

Using the mb_strlen function with the UTF-8 encoding specified, the output is: 4 (the correct number of characters).

You can use the mb_ functions to make your own version of the similar_text function: http://php.net/manual/en/ref.mbstring.php



回答2:

Just for the record and hopefully to make some help, I want to clarify the behavior of the similar_text() function when some multi-byte character strings are given (including the character strings of the Arabic.)

The function simply treats each byte of the input string as an individual character (which implies it neither supports multi-byte characters nor the Unicode.)

The byte streams of the عمار and ياسر strings are respectively represented as the following (the bytes (in the hexadecimal representation) are separated using . and, where the end of a character is reached, then a : is used instead):

06.39:06.45:06.27:06.31   <-- Byte stream for عمار
||    ||    ||    || ||
06.4A:06.27:06.33:06.31   <-- Byte stream for ياسر

As you can tell, there are five matching, and that's the reason why the function returns 5 in this case (every two hexadecimal digits represent a byte.)



回答3:

Here's one I'm using

//from http://www.phperz.com/article/14/1029/31806.html
function mb_split_str($str) {
    preg_match_all("/./u", $str, $arr);
    return $arr[0];
}

//based on http://www.phperz.com/article/14/1029/31806.html, added percent
function mb_similar_text($str1, $str2, &$percent) {
    $arr_1 = array_unique(mb_split_str($str1));
    $arr_2 = array_unique(mb_split_str($str2));
    $similarity = count($arr_2) - count(array_diff($arr_2, $arr_1));
    $percent = ($similarity * 200) / (strlen($str1) + strlen($str2) );
    return $similarity;
}

So

$var = mb_similar_text('عمار', 'ياسر', $per);
output: $var = 2, $per = 25