Sphinx with metaphone and wildcard search

2019-05-21 07:48发布

问题:

we are an anatomy platform and use sphinx for our search. We want to make our search more fuzzier and started to use metaphone to correct spelling mistakes. It finds for example phalanges even though the search word is falanges.

That's good but we want more. We want that the user could type in falange or even falang and we still find phalanges. Any ideas how to accomplish this?

If you are interested you can checkout our sphinx config file here.

Thanks!

回答1:

Well you can enable both metaphone and min_prefix_len on an index at once. It will sort of work.

falange* 

might then just work. (to match phalanges)

The problem is the 'stripped' letters may change the 'sound' of the word (because change the pronunciation)

eg falange becomes FLNJ, but falang acully becomes FLNK - so they no longer 'substrings' of one another. (ie phalanges becomes FLNJS, which FLNK* wont match)


... to be honest I dont know a good solution. You could perhaps get better results, if was to apply stemming, BEFORE metaphone. (so the endings that change the pronouncation of the words are removed.

Alas Sphinx can't do this. If you enable stemming and metaphone together, only ONE of the processors will ever fire.


Two possible solutions, implement stemming outside of sphinx (or maybe with regexp_filter. Not sure if say a porter stemmer can be implemnented purely with regular expressions)

or modify sphinx, so that ALL morphology processors apply. (rather than just the first one that changes the word)