I'm trying to normalize strings with characters like 'áéíóú' to 'aeiou' to simplify searches.
Following the response to this question I should use the Normalizer
class to do it.
The problem is that the normalize
function does nothing. For example, that code:
<?php echo 'Pérez, NFC: ' . normalizer_normalize('Pérez', Normalizer::NFC)
. ' NFD: ' .normalizer_normalize('Pérez', Normalizer::NFD)
. ' NFKC: ' .normalizer_normalize('Pérez', Normalizer::NFKC)
. ' NFKD: ' .normalizer_normalize('Pérez', Normalizer::NFKD)?>
<br/>
<?php echo 'aáàä, êëéè,'
. ' FORM_C: ' . normalizer_normalize('aáàä, êëéè', Normalizer::FORM_C )
. ' FORM_D: ' .normalizer_normalize('aáàä, êëéè', Normalizer::FORM_D)
. ' FORM_KC: ' .normalizer_normalize('aáàä, êëéè', Normalizer::FORM_KC)
. ' FORM_KD: ' .normalizer_normalize('aáàä, êëéè', Normalizer::FORM_KD)?>
shows:
Pérez, NFC: Pérez NFD: Pérez NFKC: Pérez NFKD: Pérez
aáàä, êëéè, FORM_C: aáàä, êëéè FORM_D: aáàä, êëéè FORM_KC: aáàä, êëéè FORM_KD: aáàä, êëéè
What is supposed normalize must do?
---EDITED---
It is stranger. When copy and paste the result from web browser, while in editor and original page I can see:
FORM_D: aáàä, êëéè
in the stackoverflow question page I can see (just in Code Sample mode):
FORM_D: aáàä, êëéè
Found on this page:
So eliminating accents (and similar) is not the purpose of
Normalizer
.For a function that actually removes the accents, the best that I have found so far is in the wordpress core: https://core.trac.wordpress.org/browser/trunk/src/wp-includes/formatting.php#L1127 remove_accents($string)
(Note I have filed a bug against it in order for them to take an updated version that I provided which documents each character and how it is tranlsted. so it may change in the future)
What you are looking for is
iconv("UTF-8", "ISO-8859-1//TRANSLIT", $text)
.http://php.net/manual/function.iconv.php
Be careful with
LC_*
settings! Depending on the setting the transliteration might change.