I have a news site where there will be a lot of articles eventually. I need to implement search functionality and i know that solr is one of the most popular software solutions to use to implement this today.
The site might or might not get heavy traffic but i have to implement search functionality that is designed for a heavy traffic site.
What are the benefits of using a search engine like solr instead of just querying the database (mysql) for the content and displaying it to the user ? Is it just because the search engine products like solr have superior performance when it comes to search in addition to (according to what i have read) more flexibility when it comes to searching ? Im not looking for answers like "use solr", im looking for an explaination as to why not use a database.
They solve different problems. Applications designed for search have a different core feature set than traditional databases (both SQL and NoSQL variants), since the requirements are different and their usage differ.
There are some overlaps between DB capabilities relating to search these days, but if we use standard database interactions as a start, writing "find articles with these three words present" is a task that you'll have to do manual processing to solve. Add all the other things you usually want to make search perform well and provide relevant results for your users, and you have a very different problem from what regular databases tries to solve.
A few features that search-oriented services does better:
Term and field weights: If you have a match in "title", it should be weighted more heavily than a hit in "text". But you might also have an "oldness" factor affect the score, so depending on the use case, all these weights between fields and features can be tuned to solve almost any issue you have.
Text normalisation and processing: You might want to expand synonyms while indexing. Searching for ipod and i-pod should probably give the same result. Windows and window as well. These operations are fundamental to most document search engines. You might want to allow a field to perform phonetic matches (the pronunciation of words and not their written form), and you might want to score that differently from exact matches. Solr's list of analyzers, tokenizers and filters may give you an idea of some of the available features for text processing.
Faceting / Navigators: How many of the documents in my search has different values in the field xyz, and what are their counts? You've probably seen this feature on many sites, such as "filter by file type", "only show hits for the last 7 days, last 31 days, last 365 days" etc, together with a count of documents for each bin.
Highlighting: What part of the text was matched, and extract a proper snippet that I can give back to the end user to show. You're seeing this feature each time you do a Google search, and the text below the hit shows the actual content from the webpage where your query is found.
.. and these are just a few of the features that people who work with search is considering each day. I'm not saying that these aren't solvable by more traditional DB functionality, but they require you to implement code, keep stuff in sync and in general, write a whole lot of code to get something you'd get for free with technology already made to solve the problem.
Performance depends on a lot of factors, but it'll probably do better than OK. You can scale most solutions horizontally, so you can add servers as needed while growing. But you probably won't have to do that for a while, so don't worry about it. Premature optimization, etc.