I maintain a music database in mySQL, how do I return results stored under e.g. 'Tiësto' when people search for 'Tiesto'?
All the data is stored under full text indexing, if that makes any difference.
I'm already employing a combination of Levenshtein in PHP and REGEXP in SQL - not in trying to solve this problem, but just for increased searchability in general.
PHP:
function Levenshtein($word) {
$words = array();
for ($i = 0; $i < strlen($word); $i++) {
$words[] = substr($word, 0, $i) . '_' . substr($word, $i);
$words[] = substr($word, 0, $i) . substr($word, $i + 1);
$words[] = substr($word, 0, $i) . '_' . substr($word, $i + 1);
}
$words[] = $word . '_';
return $words;
}
$fuzzyartist = Levenshtein($_POST['searchartist']);
$searchimplode = "'".implode("', '", $fuzzyartist)."'";
mySQL:
SELECT *
FROM new_track_database
WHERE artist REGEXP concat_ws('|', $searchimplode);
To add, I frequently perform character set conversions and string sanitation in PHP, but these have always been the OTHER way - standardising non latin characters. I can't get my head around performing the oppsite process, but only in certain circumstances based on the data I've got stored.