How to optimize database this query in large datab

2019-02-18 04:03发布

问题:

Query

SELECT id FROM `user_tmp` 
WHERE  `code` = '9s5xs1sy' 
  AND  `go` NOT REGEXP 'http://www.xxxx.example.com/aflam/|http://xx.example.com|http://www.xxxxx..example.com/aflam/|http://www.xxxxxx.example.com/v/|http://www.xxxxxx.example.com/vb/'  
  AND check='done'  
  AND  `dataip` <1319992460
ORDER BY id DESC 
LIMIT 50

MySQL returns:

Showing rows 0 - 29 ( 50 total, Query took 21.3102 sec) [id: 2622270 - 2602288]

Query took 21.3102 sec

if i remove

AND dataip <1319992460

MySQL returns

Showing rows 0 - 29 ( 50 total, Query took 0.0859 sec) [id: 3637556 - 3627005]

Query took 0.0859 sec

and if no data, MySQL returns

MySQL returned an empty result set (i.e. zero rows). ( Query took 21.7332 sec )

Query took 21.7332 sec

Explain plan:

  SQL query: Explain SELECT * FROM `user_tmp` WHERE `code` = '93mhco3s5y' AND `too` NOT REGEXP 'http://www.10neen.com/aflam/|http://3ltool.com|http://www.10neen.com/aflam/|http://www.10neen.com/v/|http://www.m1-w3d.com/vb/' and checkopen='2010' and `dataip` <1319992460 ORDER BY id DESC LIMIT 50;
    Rows: 1
    id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
    1   SIMPLE  user_tmp    index   NULL    PRIMARY     4   NULL    50  Using where

Example of the database used

CREATE TABLE IF NOT EXISTS user_tmp ( id int(9) NOT NULL AUTO_INCREMENT, ip text NOT NULL, dataip bigint(20) NOT NULL, ref text NOT NULL, click int(20) NOT NULL, code text NOT NULL, too text NOT NULL, name text NOT NULL, checkopen text NOT NULL, contry text NOT NULL, vOperation text NOT NULL, vBrowser text NOT NULL, iconOperation text NOT NULL,
iconBrowser text NOT NULL,

  PRIMARY KEY (`id`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 AUTO_INCREMENT=4653425 ;

--

-- Dumping data for table user_tmp

INSERT INTO `user_tmp` (`id`, `ip`, `dataip`, `ref`, `click`, `code`, `too`, `name`, `checkopen`, `contry`, `vOperation`, `vBrowser`, `iconOperation`, `iconBrowser`) VALUES
(1, '54.125.78.84', 1319506641, 'http://xxxx.example.com/vb/showthread.php%D8%AA%D8%AD%D9%85%D9%8A%D9%84-%D8%A7%D8%BA%D9%86%D9%8A%D8%A9-%D8%A7%D9%84%D8%A8%D9%88%D9%85-giovanni-marradi-lovers-rendezvous-3cd-1999-a-155712.html', 0, '4mxxxxx5', 'http://www.xxx.example.com/aflam/', 'xxxxe', '2010', 'US', 'Linux', 'Chrome 12.0.742 ', 'linux.png', 'chrome.png');

I want the correct way to do the query and optimize database

回答1:

You don't have any indexes besides the primary key. You need to make index on fields that you use in your WHERE statement. If you need to index only 1 field or a combination of several fields depends on the other SELECTs you will be running against that table.

Keep in mind that REGEXP cannot use indexes at all, LIKE can use index only when it does not begin with wildcard (so LIKE 'a%' can use index, but LIKE '%a' cannot), bigger than / smaller than (<>) usually don't use indexes also.

So you are left with the code and check fields. I suppose many rows will have the same value for check, so I would begin the index with code field. Multi-field indexes can be used only in the order in which they are defined...

Imagine index created for fields code, check. This index can be used in your query (where the WHERE clause contains both fields), also in the query with only code field, but not in query with only check field.

Is it important to ORDER BY id? If not, leave it out, it will prevent the sort pass and your query will finish faster.



回答2:

I will assume you are using mysql <= 5.1

The answers above fall into two basic categories: 1. You are using the wrong column type 2. You need indexes

I will deal with each as both are relevant for performance which is ultimately what I take your questions to be about:

  1. Column Types The difference between bigint/int or int/char for the dataip question is basically not relevant to your issue. The fundamental issue has more to do with index strategy. However when considering performance holistically, the fact that you are using MyISAM as your engine for this table leads me to ask if you really need "text" column types. If you have short (less than 255 say) character columns, then making them fixed length columns will most likely increase performance. Keep in mind that if any one column is of variable length (varchar, text, etc) then this is not worth changing any of them.

  2. Vertical Partitioning The fact to keep in mind here is that even though you are only requesting the id column from the standpoint of disk IO and memory you are getting the entire row back. Since so many of the rows are text, this could mean a massive amount of data. Any of these rows that are not used for lookups of users or are not often accessed could be moved into another table where the foreign key has a unique key placed on it keeping the relationship 1:1.

  3. Index Strategy Most likely the problem is simply indexing as is noted above. The reason that your current situation is caused by adding the "AND dataip <1319992460" condition is that it forces a full table scan.

As stated above placing all the columns in the where clause in a single, composite index will help. The order of the columns in the index will no matter so long as all of them appear in the where clause.

However, the order could matter a great deal for other queries. A quick example would be an index made of (colA, colB). A query with "where colA = 'foo'" will use this index. But a query with "where colB = 'bar'" will not because colB is not the left most column in the index definition. So, if you have other queries that use these columns in some combination it is worth minimizing the number of indexes created on the table. This is b/c every index increases the cost of a write and uses disk space. Writes are expensive b/c of necessary disk activity. Don't make them more expensive.



回答3:

You need to add index like this:

ALTER TABLE  `user_tmp` ADD INDEX(`dataip`);

And if your column 'dataip' contains only unique values you can add unique key like this:

ALTER TABLE  `user_tmp` ADD UNIQUE(`dataip`);

Keep in mind, that adding index can take long time on a big table, so don't do it on production server with out testing.



回答4:

You need to create index on fields in the same order that that are using in where clause. Otherwise index is not be used. Index fields of your where clause.



回答5:

does dataip really need to be a bigint? According to mysql The signed range is -9223372036854775808 to 9223372036854775807 ( it is a 64bit number ).

You need to choose the right column type for the job, and add the right type of index too. Else these queries will take forever.