Brief
Help me to create a new function or change the function correct()
so that the result works in a case-insensitive
manner for the input text.
Example
Usage
Example usage for the correct()
method:
$text = "Точик ТОЧИК точик ТоЧиК тоЧИК";
$text = correct($text, $base_words);
echo "$text";
Expected Result
Input: Точик ТОЧИК точик ТоЧиК тоЧИК
Output: Тоҷик ТОҶИК тоҷик ТоҶиК тоҶИК
Code
Here are all the arrays and functions below so you can easily copy them:
$default_words = array
(
'бур',
'кори',
'давлати',
'забони',
'фанни'
);
$base_words = array
(
"точик" => "тоҷик",
"точики" => "тоҷики",
"точикон" => "тоҷикон",
"чахонгир" => "ҷаҳонгир",
"галат" => "ғалат",
"уктам" => "ӯктам",
);
$base_special_words = array
(
"кори хатти" => "кори хаттӣ",
"хатти аз" => "хаттӣ аз",
"забони точики" => "забони тоҷикӣ",
"точики барои" => "тоҷикӣ барои",
"забони давлати" => "забони давлатӣ",
"давлати дар" => "давлатӣ дар",
"микёси чахони" => "миқёси ҷаҳонӣ",
);
function correct($request, $dictionary)
{
$search = array("ғ","ӣ","ҷ","ҳ","қ","ӯ","Ғ","Ӣ","Ҷ","Ҳ","Қ","Ӯ");
$replace = array("г","и","ч","х","к","у","Г","И","Ч","Х","К","У");
$request = str_replace($search, $replace, $request); // replace special letters to default cyrillic letters
$result = preg_replace_callback("/\pL+/u", function ($m) use ($dictionary) {
$word = mb_strtolower($m[0]);
if (isset($dictionary[$word])) {
$repl = $dictionary[$word];
// Check for some common ways of upper/lower case
// 1. all lower case
if ($word === $m[0]) return $repl;
// 2. all upper case
if (mb_strtoupper($word) === $m[0]) return mb_strtoupper($repl);
// 3. Only first letters are upper case
if (mb_convert_case($word, MB_CASE_TITLE) === $m[0]) return mb_convert_case($repl, MB_CASE_TITLE);
// Otherwise: check each character whether it should be upper or lower case
for ($i = 0, $len = mb_strlen($word); $i < $len; ++$i) {
$mixed[] = mb_substr($word, $i, 1) === mb_substr($m[0], $i, 1)
? mb_substr($repl, $i, 1)
: mb_strtoupper(mb_substr($repl, $i, 1));
}
return implode("", $mixed);
}
return $m[0]; // Nothing changes
}, $request);
return $result;
}
Questions
How do I properly correct the input text?
InputКори хатти аз фанни забони точики барои забони давлати дар микёси чахони.
Output
Кори хаттӣ аз фанни забони тоҷикӣ барои забони давлатӣ дар миқёси ҷаҳонӣ.
Here, most likely, you need to fix the text step by step using 3 arrays. My algorithm did not give suitable results. And so I created an array that consists of two words ($base_special_words
).
My algorithm corrects sentence by words from the dictionary:
Step 1.
You need to create a temp array
from the elements of the $base_special_words
array from those words that occur in the sentence. The temp array looks like this:
$temp_for_base_special_words = array
(
"кори хатти",
"хатти аз",
"забони точики",
"точики барои",
"забони давлати",
"давлати дар",
"микёси чахони",
);
All these words meet in the sentence. Then we cut out those words that are in the temp array. After cutting out those words from the sentence, the sentence looks like this:
Full sentence before cutting:Кори хатти аз фанни забони точики барои забони давлати дар микёси чахони. Точик мард аст.
Cutted part of sentence:
Кори хатти аз забони точики барои забони давлати дар микёси чахони
Sentence after cutting:
фанни. Точик мард аст.
Step 2.
Then the remaining part of the sentence will be checked with the array $default_words and the words that are in this array from the sentence are cut.
Sentence before cutting in step 2:фанни. Точик мард аст.
Cutted part:
фанни
Sentence after cutting:
. Точик мард аст.
Array with cutted words:
$temp_for_default_words = array("фанни");
Step 3.
Cut those words from the rest of the sentence that are available in the $base_words array.
Sentence before cutting in step 3:. Точик мард аст.
Cutted part:
Точик
Sentence after cutting:
. мард аст.
Array with cutted words:
$temp_for_base_words = array ("точик");
The rest of the offer must be temporarily cut and hidden so that there is no treatment with it.
Sentence part for hidden:. мард аст.
And in the end, you need to replace using three new arrays using the dictionary and return the hidden part.
Correcting step
Step 1.
Usage `$temp_for_base_special_words`:
Using $temp_for_base_special_words
values for find values for with keys( $temp_for_base_special_words[$value]
) in $base_special_words
with and replace that keys to value in input text.
Step 2.
Usage `$temp_for_default_words`:
Using $temp_for_default_words
values for find values for with keys( $temp_for_default_words[$value]
) in $base_default_words
with and replace that keys to value in input text.
Step 3.
Usage `$temp_for_default_words`:
Using $temp_for_base_words
values for find values for with keys( $temp_for_base_words[$value]
) in $base_words
with and replace that keys to value in input text.