How can I match variations on words in MySQL, for example a search for accountancy should match accountant, accountants, accounting etc. I'm on shared hosting so can't add any functions to MySQL such as levenshtein.
I want something similar to how Google matches 'accounting course' and 'accountancy courses' when searching for 'accountant courses'. Example.
My server language is php, if it's only possible to implement it there and not in SQL.
The current statement is as follows.
SELECT
pjs.title,
MATCH (pjs.title) AGAINST ('accountancy' IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION) AS rel1,
MATCH (pjs.description) AGAINST ('accountancy' IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION) AS rel2,
MATCH (
pjs.benefits,
pjs.experienceRequirements,
pjs.incentives,
pjs.qualifications,
pjs.responsibilities,
pjs.skills
) AGAINST ('accountancy' IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION) AS rel3
FROM
pxl_jobsearch AS pjs
ORDER BY (rel1 * 5) + (rel2 * 1.5) + (rel3) DESC;
Search engines do this by implementing a text processing technique known as stemming. There are many libraries that implement this for you, personally I have user the Snowball stemmer, it does a good enough job.
I'm not familiar enough with the full text search capabilities of MySql, but you could try to apply the stemming algorithms to the search terms. For your example of "accountancy courses" the Snowball stemmer returns "account cours".
The
SOUNDEX()
function of MySQL comes pretty close. Read more about it here.Example:
If this doesn't solve it, the levenshtein algorithm is the way to go. Talk to your database administrator that he allows you to create functions. If he does, here's the solution (I didn't write the function, credit goes to anonymous):
You may use SQL
SOUNDEX()
, which is pretty useful for your needs: it searches for words that sound the same, not for grammatically close ones. You may use two approaches, which are quite similar.%
in theLIKE
clause: using your example,accountancy
would becomeaccount%
.stristr()
function): so,accountancy
would produce a clause similar toWHERE value='accountancy' or value='accountant' or value='accountancies'
.I don't know much about
MATCH
, when I want to select a column with variations I do the followingI work mostly in SQL Server but do some MySQL. I imagine that this works in MySQL as well.
MySQL isn't very good at full text search and you'd probably want to use other engines. My favorite one is Sphinx (http://sphinxsearch.com/) but there are others as well. Most of these support stemming out of the box.
If you have large tables and are going to use stemming the performance of MySQL will probably be very bad.
If you can't use Sphinx, take a look at this php script http://tartarus.org/~martin/PorterStemmer/php.txt
With this you can use stemming, and the search on the stemmed words.