How to search millions of record in SQL table fast

2020-05-26 09:47发布

问题:

I have SQL table with millions of domain name. But now when I search for let's say

SELECT * 
  FROM tblDomainResults 
 WHERE domainName LIKE '%lifeis%'

It takes more than 10 minutes to get the results. I tried indexing but that didn't help.

What is the best way to store this millions of record and easily access these information in short period of time?

There are about 50 million records and 5 column so far.

回答1:

Most likely, you tried a traditional index which cannot be used to optimize LIKE queries unless the pattern begins with a fixed string (e.g. 'lifeis%').

What you need for your query is a full-text index. Most DBMS support it these days.



回答2:

Full-text indexing is the far-and-away best option here - how this is accomplished will depend on the DBMS you're using.

Short of that, ensuring that you have an index on the column being matched with the pattern will help performance, but by the sounds of it, you've tried this and it didn't help a great deal.



回答3:

Assuming that your 50 million row table includes duplicates (perhaps that is part of the problem), and assuming SQL Server (the syntax may change but the concept is similar on most RDBMSes), another option is to store domains in a lookup table, e.g.

CREATE TABLE dbo.Domains
(
    DomainID INT IDENTITY(1,1) PRIMARY KEY,
    DomainName VARCHAR(255) NOT NULL
);
CREATE UNIQUE INDEX dn ON dbo.Domains(DomainName);

When you load new data, check if any of the domain names are new - and insert those into the Domains table. Then in your big table, you just include the DomainID. Not only will this keep your 50 million row table much smaller, it will also make lookups like this much more efficient.

SELECT * -- please specify column names
FROM dbo.tblDomainResults AS dr
INNER JOIN dbo.Domains AS d
ON dr.DomainID = d.DomainID
WHERE d.DomainName LIKE '%lifeis%';

Of course except on the tiniest of tables, it will always help to avoid LIKE clauses with a leading wildcard.



回答4:

Stop using LIKE statement. You could use fulltext search, but it will require MyISAM table and isn't all that good solution.

I would recommend for you to examine available 3rd party solutions - like Lucene and Sphinx.
They will be superior.



回答5:

One thing you might want to consider is having a separate search engine for such lookups. For example, you can use a SOLR (lucene) server to search on and retrieve the ids of entries that match your search, then retrieve the data from the database by id. Even having to make two different calls, its very likely it will wind up being faster.



回答6:

Indexes are slowed down whenever they have to go lookup ("bookmark lookup") data that the index itself doesn't contain. For instance, if your index has 2 columns, ID, and NAME, but you're selecting * (which is 5 columns total) the database has to read the index for the first two columns, then go lookup the other 3 columns in a less efficient data structure somewhere else.

In this case, your index can't be used because of the "like". This is similar to not putting any where filter on the query, it will skip the index altogether since it has to read the whole table anyway it will do just that ("table scan"). There is a threshold (i think around 35-50% where the engine normally flips over to this).

In short, it seems unlikely that you need all 50 million rows from the DB for a production application, but if you do... use a machine with more memory and try methods that keep that data in memory. Maybe a No-SQL DB would be a better option - mongoDB, couch DB, tokyo cabinet. Things like this. Good luck!



回答7:

You could try breaking up the domain into chunks and then searh the chunks themselves. I did some thing like that years ago when I needed to search for words in sentences. I did not have full text searching available so I broke up the sentences into a word list and searched the words. It was really fast to find the results since the words were indexed.