My client has a huge database containing just three fields:
- Primary key (a unsigned number)
- Name (multi-word text)
- Description (up to 1000 varchar)
This database has got over few billion entries. I have no previous experience in handling such large amounts of data.
He wants me to design an interface using AJAX (like Google) to search this database. My queries are as slow as turtle.
What is best way to search text fields in such a large database? If the user is typing wrong spelling on interface, how can I return what he wanted ?
If you are using FULLTEXT indexes, you're correctly writing your queries, and the speed in which the results are returned are not adequate, you are entering a territory where MySQL may simply not be sufficient for you..
You may be able to tweak settings, purchase enough RAM to make sure that your entire data-set fits 100% in memory. It's definitely true that performance gains could be huge there.
I'd definitely recommend looking into tweaks of your mysql configuration. We've had some silly settings in the past. Operating system defaults tend to really suck!
However, if you have trouble at that point, you can:
- Create a separate table containing each word (indexed) along with a record id that it refers to. This will allow you to search on single words.
- Use a different system that's optimized for solving this problem. Unless my information is now outdated, the 2 engines that are the most popular for solving this problem are:
- Sphinx
- Solr / Lucene
If your table is myISAM then you can set the Name and Description fields to FULLTEXT
CREATE TABLE articles (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
Name VARCHAR(200),
Description TEXT,
FULLTEXT (Name,Description)
);
Then you can use queries like:
SELECT * FROM articles
WHERE MATCH (Name,Description) AGAINST ('database');
Your can find more info at http://docs.oracle.com/cd/E17952_01/refman-5.0-en/fulltext-search.html
Before doing any of the above you might want to backup (or atleast make a copy) of your database.
You can't. The only fast search in your scenario would be on the Primary Key since that's most likely to be the index. Text search is slow as a turtle.
In all seriousness, you have a few solutions:
If you have to stick with NoSQL you'll have to redesign you scheme. It's hard to give you a good recommendation without knowing the requirements. One solution would be to index keywords in a separate table.
Another solution is to switch to a different search engine, you can find suggestions in other questions here such as: Fast SQL Server search on 40M text records