I would like to add a search field to my site. The site is based on PHP and the Yii framework. The web-server assembles multiple data (from files and APIs) before serving the resulting web-page (the web-server will get these pieces of data out of a MySQL database sooner or later, but it's just files at the moment, and API results).
Apache's Lucene could answer the problem, but there is no way to use Java on the server - I am on a shared Linux host.
Google site search (or bing's,..) could answer the problem, but I would like to have a fully-customizable search box, and add some results to the proposed result.
I could create my own search engine, indexing pages and using different weights according to where each piece of data come from, to have a precise result ; but I think there must be something out there that would be more efficient, and quicker to implement.
What'd be a way to add a quick search functionality to a PHP based website, without using Java or Google site search ?
You need to have all the data(page name and url) in a database, and than you can make the search function using the
LIKE
operator in a MySql query:There is a lot search engines. Personally I like Sphinx Search. But you need able to compile and run it on your (or remote) server. You can look on php based search engines like seekquarry
I use Zend Framework and consequently Zend_Search_Lucene. It's a pure PHP implementation of a faceted search. You can define your own "document" (as an aggregate of your data), weight axes, and build indexes relatively straight-forwardly. The downside, in my experience, is that it's much slower on indexing and query than (eg) solr.
Update 1 In response to comment, here's a link: how I use Zend_Search_Lucene for spatial searches. The code there demonstrates a few things:
Update 2 Responding to the comment on performance. Putting the index on a fast medium (SD, RAM disk w/ sync, whatever) speeds it up a bit. Using unstored fields also helps a bit. Both of these reduce the constant in the empirical O(n log n), but still the dominant problem is that n multiplier. What Zend appears to do is, upon each add, re-shuffle most or all of the previous adds to the index. As far as I can tell, this is the algorithm in play during index build and can't be modified.
The way I got around that n-multiplier was to use a Zend Page Cache based on the stemmed query (so if someone types "blueberries", "blueberry", "blue berry", "blu bary", etc. they all get stemmed and fixed to the soundex phonetic "blue-bear-ee"). Common queries get almost instant results, and since the particular domain was read-heavy and insert-latent, this was an acceptable solution. Obviously in general it's not.
In other circumstances, there is the setResultSetLimit() method, which when used with scoring, will return results faster. If you don't care about all possible results, just the top N results, then this is the way to go.
Finally, all this experience is with respect to Zend 1.x. I do not know if this has been addressed in 2.x.