Optimizing MySQL query for integer range search

2019-06-14 09:35发布

问题:

I have over 1.7 million records in a table which contains ip address range(begin and end) both primary key and corresponding details.

The Table structure is

mysql> desc csv;
+---------+-------------+------+-----+---------+-------+
| Field   | Type        | Null | Key | Default | Extra |
+---------+-------------+------+-----+---------+-------+
| begin   | bigint(20)  | NO   | PRI | 0       |       |
| end     | bigint(20)  | NO   | PRI | 0       |       |
| code    | char(2)     | YES  |     | NULL    |       |
| country | varchar(50) | YES  |     | NULL    |       |
| city    | varchar(50) | YES  |     | NULL    |       |
| area    | varchar(50) | YES  |     | NULL    |       |
+---------+-------------+------+-----+---------+-------+

Because of Indexing in Primary Key, the search is fast when an exact match is to be made like this

mysql> SELECT * FROM csv WHERE begin=3338456576;
+------------+------------+------+---------------+----------+---------------+
| begin      | end        | code | country       | city     | area          |
+------------+------------+------+---------------+----------+---------------+
| 3338456576 | 3338456831 | US   | UNITED STATES | NEW YORK | NEW YORK CITY |
+------------+------------+------+---------------+----------+---------------+
1 row in set (0.03 sec)

But when I try to search within a range, It takes longer time.

mysql> SELECT * FROM csv WHERE begin<3338456592 AND end>3338456592;
+------------+------------+------+---------------+----------+---------------+
| begin      | end        | code | country       | city     | area          |
+------------+------------+------+---------------+----------+---------------+
| 3338456576 | 3338456831 | US   | UNITED STATES | NEW YORK | NEW YORK CITY |
+------------+------------+------+---------------+----------+---------------+
1 row in set (1.59 sec)

Is there any way I can optimize my query to search ip address within a range?

EDIT

Create table Statement

CREATE TABLE `csv` (
  `begin` bigint(20) NOT NULL DEFAULT '0',
  `end` bigint(20) NOT NULL DEFAULT '0',
  `code` char(2) DEFAULT NULL,
  `country` varchar(50) DEFAULT NULL,
  `city` varchar(50) DEFAULT NULL,
  `area` varchar(50) DEFAULT NULL,
  PRIMARY KEY (`begin`,`end`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

回答1:

If the IP ranges are not overlapping, so the query is never going to return more than 1 row, you can use this:

SELECT q.*
FROM 
  ( SELECT csv.* 
    FROM csv
    WHERE csv.begin < 3338456592 
    ORDER BY csv.begin DESC
    LIMIT 1
  ) AS q
WHERE 3338456592 < q.end ;

No index needs to be added. The primary index will be used.



回答2:

In case the ranges are overlapping you should:

  • define the ip range as a LineString column
  • define spatial index on that column
  • use a geometric "contains" query

See more in Efficient data model for range queries



回答3:

What score on SELECT begin, end, code, country, city, area FROM csv WHERE begin <> 3338456592 HAVING begin NOT BETWEEN MIN(begin) AND MAX(end)?

UPD: It is my version of table structure.

CREATE TABLE `csv` (
  `begin` INT(10) NOT NULL DEFAULT '0',
  `end` INT(10) NOT NULL DEFAULT '0',
  `code` char(2) DEFAULT NULL,
  `country` varchar(50) DEFAULT NULL,
  `city` varchar(45) DEFAULT NULL,
  `area` varchar(40) DEFAULT NULL,
  KEY `combined` (`begin`,`end`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

I think use country and code as ENUM it is faster.