How to implement my algorithm text correction for

2019-04-11 22:41发布

Brief

Help me to create a new function or change the function correct() so that the result works in a case-insensitive manner for the input text.


Example

Usage

Example usage for the correct() method:

$text = "Точик ТОЧИК точик ТоЧиК тоЧИК";

$text = correct($text, $base_words);
echo "$text";

Expected Result

Input: Точик ТОЧИК точик ТоЧиК тоЧИК
Output: Тоҷик ТОҶИК тоҷик ТоҶиК тоҶИК


Code

Here are all the arrays and functions below so you can easily copy them:

$default_words = array
(
    'бур',
    'кори',
    'давлати',
    'забони',
    'фанни'
);

$base_words = array
(
    "точик"    => "тоҷик",
    "точики"   => "тоҷики",
    "точикон"  => "тоҷикон",
    "чахонгир" => "ҷаҳонгир",
    "галат"    => "ғалат",
    "уктам"    => "ӯктам",
);

$base_special_words = array
(
    "кори хатти"     => "кори хаттӣ",
    "хатти аз"       => "хаттӣ аз",
    "забони точики"  => "забони тоҷикӣ",
    "точики барои"   => "тоҷикӣ барои",
    "забони давлати" => "забони давлатӣ",
    "давлати дар"    => "давлатӣ дар",
    "микёси чахони"  => "миқёси ҷаҳонӣ",
);


function correct($request, $dictionary)
{
    $search  = array("ғ","ӣ","ҷ","ҳ","қ","ӯ","Ғ","Ӣ","Ҷ","Ҳ","Қ","Ӯ");
    $replace = array("г","и","ч","х","к","у","Г","И","Ч","Х","К","У");
    $request = str_replace($search, $replace, $request); // replace special letters to default cyrillic letters

    $result = preg_replace_callback("/\pL+/u", function ($m) use ($dictionary) {
    $word = mb_strtolower($m[0]);
    if (isset($dictionary[$word])) {
        $repl = $dictionary[$word];
        // Check for some common ways of upper/lower case
        // 1. all lower case
        if ($word === $m[0]) return $repl;
        // 2. all upper case
        if (mb_strtoupper($word) === $m[0]) return mb_strtoupper($repl);
        // 3. Only first letters are upper case
        if (mb_convert_case($word,  MB_CASE_TITLE) === $m[0]) return mb_convert_case($repl,  MB_CASE_TITLE);
        // Otherwise: check each character whether it should be upper or lower case
        for ($i = 0, $len = mb_strlen($word); $i < $len; ++$i) {
            $mixed[] = mb_substr($word, $i, 1) === mb_substr($m[0], $i, 1) 
                ? mb_substr($repl, $i, 1)
                : mb_strtoupper(mb_substr($repl, $i, 1));
        }
        return implode("", $mixed);
    }
    return $m[0]; // Nothing changes
    }, $request);


    return $result;
}

Questions

How do I properly correct the input text?

Input
Кори хатти аз фанни забони точики барои забони давлати дар микёси чахони.
Output
Кори хаттӣ аз фанни забони тоҷикӣ барои забони давлатӣ дар миқёси ҷаҳонӣ.

Here, most likely, you need to fix the text step by step using 3 arrays. My algorithm did not give suitable results. And so I created an array that consists of two words ($base_special_words).

My algorithm corrects sentence by words from the dictionary:

Step 1.

You need to create a temp array from the elements of the $base_special_words array from those words that occur in the sentence. The temp array looks like this:

$temp_for_base_special_words = array
(
    "кори хатти",
    "хатти аз",
    "забони точики",
    "точики барои",
    "забони давлати",
    "давлати дар",
    "микёси чахони",   
);

All these words meet in the sentence. Then we cut out those words that are in the temp array. After cutting out those words from the sentence, the sentence looks like this:

Full sentence before cutting:
Кори хатти аз фанни забони точики барои забони давлати дар микёси чахони. Точик мард аст.
Cutted part of sentence:
Кори хатти аз забони точики барои забони давлати дар микёси чахони
Sentence after cutting:
фанни. Точик мард аст.

Step 2.

Then the remaining part of the sentence will be checked with the array $default_words and the words that are in this array from the sentence are cut.

Sentence before cutting in step 2:
фанни. Точик мард аст.
Cutted part:
фанни
Sentence after cutting:
. Точик мард аст.
Array with cutted words:
$temp_for_default_words = array("фанни");

Step 3.

Cut those words from the rest of the sentence that are available in the $base_words array.

Sentence before cutting in step 3:
. Точик мард аст.
Cutted part:
Точик
Sentence after cutting:
. мард аст.
Array with cutted words:
$temp_for_base_words = array ("точик");

The rest of the offer must be temporarily cut and hidden so that there is no treatment with it.

Sentence part for hidden:
. мард аст.

And in the end, you need to replace using three new arrays using the dictionary and return the hidden part.

Correcting step

Step 1.

Usage `$temp_for_base_special_words`:


Using $temp_for_base_special_words values for find values for with keys( $temp_for_base_special_words[$value]) in $base_special_words with and replace that keys to value in input text.

Step 2.

Usage `$temp_for_default_words`:


Using $temp_for_default_words values for find values for with keys( $temp_for_default_words[$value]) in $base_default_words with and replace that keys to value in input text.

Step 3.

Usage `$temp_for_default_words`:


Using $temp_for_base_words values for find values for with keys( $temp_for_base_words[$value]) in $base_words with and replace that keys to value in input text.

Step 4.

Return hidden part of text to input coordinates

1条回答
我想做一个坏孩纸
2楼-- · 2019-04-11 23:11

What @ctwheels wanted to tell you is to use str_ireplace (documentation), if you want to correct word with case-insensitive.

<?php
     $test="Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.";
     $word=explode(" ",$test); //This function is need for take all the words individually, the link of the function is above
     foreach($word as $key=>$value)
        if (array_key_exists($value,$YourArrayWithCorrectWord))
            $word[$key]=$YourArrayWithCorrectWord[$value]; //This, if i don't make mistakes, take the correct word and assigns to the wrong word.

     $TestCorrect=implode(" ",$word);
?>

If there is something that you don't understand, write me.

I hope I have helped you.

Documentation: Here the documentation of explode

Here the documentation of implode

Here the documentation of array_key_exsist

P.S. This method have the problem that you can't correct two or more words together.

查看更多
登录 后发表回答