How to amend sub strings?

2019-02-21 03:48发布

问题:

Using collation xxx_german2_ci which treats ü and ue as identical, is it possible to have all occurences of München be highlighted as follows?

  • Example input: "München can also be written as Muenchen."

  • Example output: "<b>München</b> can also be written as <b>Muenchen</b>."

Note: It is OK to use some non-SQL programming in addition. The only requirement is that the knowledge about which character sequences are identical is taken from the MySQL collation.

回答1:

I have found this tables: http://developer.mimer.com/collations/charts/index.tml. They are, of course, landuage dependant. Collation is just comapring algorithm. For general utf8 I am not sure, how it treats special characters.

You can use them to found desired symbols and replace them in output to get same result as in example. But for those, you will need some programming language (PHP or anything else).

Another resources:

http://collation-charts.org/

http://mysql.rjweb.org/doc.php/charcoll (down on the page)

Basicly, try to google "collation algorithm mysql utf8_general_ci" or something like this



回答2:

In the end I decided to do it all in PHP, therefore my question about which characters are equal with utf8_general_ci.

Below is what I came up with, by example: A label is constructed from a text $description, with sub strings $term highlighted, and special characters converted. Substitution is not complete, but probably sufficient for the actual use case.

mb_internal_encoding("UTF-8");

function withoutAccents($s) {
    return strtr(utf8_decode($s),
                 utf8_decode('àáâãäçèéêëìíîïñòóôõöùúûüýÿß'),
                 'aaaaaceeeeiiiinooooouuuuyys');
}

function simplified($s) {
    return withoutAccents(strtolower($s));
}

function encodedSubstr($s, $start, $length) {
    return htmlspecialchars(mb_substr($s, $start, $length));
}

function labelFromDescription($description, $term) {
    $simpleTerm = simplified($term);
    $simpleDescription = simplified($description);

    $lastEndPos = $pos = 0;
    $termLen = strlen($simpleTerm);
    $label = ''; // HTML
    while (($pos = strpos($simpleDescription,
                          $simpleTerm, $lastEndPos)) !== false) {
        $label .=
            encodedSubstr($description, $lastEndPos, $pos - $lastEndPos).
            '<strong>'.
            encodedSubstr($description, $pos, $termLen).
            '</strong>';
        $lastEndPos = $pos + $termLen;
    }
    $label .= encodedSubstr($description, $lastEndPos,
                            strlen($description) - $lastEndPos);

    return $label;
}

echo labelFromDescription('São Paulo <SAO>', 'SAO')."\n";
echo labelFromDescription('München <MUC>', 'ünc');

Output:

<strong>São</strong> Paulo &lt;<strong>SAO</strong>&gt;
M<strong>ünc</strong>hen &lt;MUC&gt;