Something better than SOUNDEX [closed]

2019-08-06 13:05发布

Does anyone know of an algorithm that is better than SOUNDEX of relatively-spelt or pronounced words?

EDIT

I'm looking for something for SPANISH.

1条回答
We Are One
2楼-- · 2019-08-06 13:49

Soundex is a very old and simple hash for English words. It was designed to match misspelled words; for example "Their", "Thier", and "There" have the same Soundex codes.

Problems with Soundex include that it is heavily biased toward English, and discards too much data and so has many false positives. A better algorithm for English words is Metaphone.

If you are looking to match Spanish misspellings, there is a Double-Metaphone algorithm that can accept tables of sound-alikes (e.g. "asta" and "hasta"). You have to create your own tables, and I have heard that double-Metaphone is orders of magnitude slower than single-Metaphone.

Another option is to alter the Metaphone algorithm to use Spanish phonemes instead of English. Someone has already done this in PHP.

查看更多
登录 后发表回答