I'm looking for a stand-alone full-text search server with the following properties:
- Must operate as a stand-alone server that can serve search requests from multiple clients
- Must be able to do "bulk indexing" by indexing the result of an SQL query: say "SELECT id, text_to_index FROM documents;"
- Must be free software and must run on Linux with MySQL as the database
- Must be fast (rules out MySQL's internal full-text search)
The alternatives I've found that have these properties are:
- Solr (based on Lucene)
- ElasticSearch (also based on Lucene)
- Sphinx
My questions:
- How do they compare?
- Have I missed any alternatives?
- I know that each use case is different, but are there certain cases where I would definitely not want to use a certain package?
Note: There are many users with the same question in mind.
So, to answer to the point:
Which and why?
Use Solr if you intend to use it in your web-app(example-site search engine). It will definitely turn out to be great, thanks to its API. You will definitely need that power for a web-app.
Use Sphinx if you want to search through tons of documents/files real quick. It indexes real fast too. I would recommend not to use it in an app that involves JSON or parsing XML to get the search results. Use it for direct dB searches. It works great on MySQL.
Alternatives
Although these are the giants, there are plenty more. Also, there are those that use these to power their custom frameworks. So, i would say that you really haven't missed any. Although there is one elasticsearch that has a good user base.
Unless you need to extend the search functionality in any proprietary way, Sphinx is your best bet.
Sphinx advantages:
Solr advantages:
I've been using Solr successfully for almost 2 years now, and have never used Sphinx, so I'm obviously biased. However, I'll try to keep it objective by quoting the docs or other people. I'll also take patches to my answer :-)
Similarities:
Here are some differences:
java -jar start.jar
). Sphinx has no additional configuration.Related questions:
I have been using Sphinx for almost a year now, and it has been amazing. I can index 1.5 million documents in about a minute on my MacBook, and even quicker on the server. I am also using Sphinx to limit searches to places within specific latitudes & longitudes, and it is very fast. Also, how results are ranked is very tweakable. Easy to install & setup, if you read a tutorial or two. Almost 1.0 status, but their Release Candidates have been rock solid.
Lucene / Solr appears to be more featured and with longer years in business and a much stronger user community. imho if you can get past the initial setup issues as some seems to have faced (not we) then I would say Lucene / Solr is your best bet.