I'm building a Django site and I am looking for a search engine.
A few candidates:
Lucene/Lucene with Compass/Solr
Sphinx
Postgresql built-in full text search
MySQl built-in full text search
Selection criteria:
- result relevance and ranking
- searching and indexing speed
- ease of use and ease of integration with Django
- resource requirements - site will be hosted on a VPS, so ideally the search engine wouldn't require a lot of RAM and CPU
- scalability
- extra features such as "did you mean?", related searches, etc
Anyone who has had experience with the search engines above, or other engines not in the list -- I would love to hear your opinions.
EDIT: As for indexing needs, as users keep entering data into the site, those data would need to be indexed continuously. It doesn't have to be real time, but ideally new data would show up in index with no more than 15 - 30 minutes delay
I'm looking at PostgreSQL full-text search right now, and it has all the right features of a modern search engine, really good extended character and multilingual support, nice tight integration with text fields in the database.
But it doesn't have user-friendly search operators like + or AND (uses & | !) and I'm not thrilled with how it works on their documentation site. While it has bolding of match terms in the results snippets, the default algorithm for which match terms is not great. Also, if you want to index rtf, PDF, MS Office, you have to find and integrate a file format converter.
OTOH, it's way better than the MySQL text search, which doesn't even index words of three letters or fewer. It's the default for the MediaWiki search, and I really think it's no good for end-users: http://www.searchtools.com/analysis/mediawiki-search/
In all cases I've seen, Lucene/Solr and Sphinx are really great. They're solid code and have evolved with significant improvements in usability, so the tools are all there to make search that satisfies almost everyone.
for SHAILI - SOLR includes the Lucene search code library and has the components to be a nice stand-alone search engine.
Good to see someone's chimed in about Lucene - because I've no idea about that.
Sphinx, on the other hand, I know quite well, so let's see if I can be of some help.
I've no idea how applicable to your situation this is, but Evan Weaver compared a few of the common Rails search options (Sphinx, Ferret (a port of Lucene for Ruby) and Solr), running some benchmarks. Could be useful, I guess.
I've not plumbed the depths of MySQL's full-text search, but I know it doesn't compete speed-wise nor feature-wise with Sphinx, Lucene or Solr.