Storing a distance matrix in DB [closed]

2019-07-02 00:19发布

I need do display a distance matrix on my web-page for all the nearby locations for a city.

I would like to fetch all this data from web-service and save in my DB in advance. I am trying to figure out the best relational DB design to save such a data.

I want to avoid redundant data and also a design which gives optimal performance.

I know relation DB is not the best option for this but that is something I can not help at this point.

Question: So what is the best DB schema design to store such info. I would need to query DB providing just one city and I would have to display a matrix of 5 or 10 closest cities.

Travel time is not that important, I am concerned about distance mainly.

A matrix of this kind minus the duration

2条回答
霸刀☆藐视天下
2楼-- · 2019-07-02 01:02

The easiest way would be to store a pair of cities along with the distance and any other data you want to be able to display. I'd store the cities themselves in a separate table, and only store two keys and the distance information in a distance table.

I you're sure you only want to display the 5 or 10 closest at most, you can start with only adding those records. That means for N cities you will only get N*10 records in the database which should be quite scalable.

Even with larger number of records, the performance should be good if you add proper indexes.

查看更多
霸刀☆藐视天下
3楼-- · 2019-07-02 01:15

For the sake of performance, and assuming you are using InnoDB, I'd probably denormalize the data a bit, like this:

CREATE TABLE CITY (
    CITY_ID INT PRIMARY KEY
);

CREATE TABLE CITY_DISTANCE (
    CITY1_ID INT,
    CITY2_ID INT,
    DISTANCE NUMERIC NOT NULL,
    PRIMARY KEY (CITY1_ID, DISTANCE, CITY2_ID),
    FOREIGN KEY (CITY1_ID) REFERENCES CITY (CITY_ID),
    FOREIGN KEY (CITY2_ID) REFERENCES CITY (CITY_ID)
);

Each pair of cities has 2 rows in CITY_DISTANCE containing the same DISTANCE (one for each direction). This could obviously make it very big and could lead to data inconsistencies (the database will not defend itself from non-matching DISTANCE values between same cities), and the DISTANCE doesn't logically belong to the PK, but bear with me...

InnoDB tables are clustered, which means that by declaring the PK in this particular way we put the whole table in a B-Tree that is particularly suited for a query like this:

SELECT CITY2_ID, DISTANCE
FROM CITY_DISTANCE
WHERE CITY1_ID = 1
ORDER BY DISTANCE
LIMIT 5

This query returns the closest 5 cities to the city identified by 1, and can be satisfied by a simple range scan on the B-Tree mentioned above:

id  select_type table           type    possible_keys   key     key_len ref     rows    Extra
1   SIMPLE      CITY_DISTANCE   ref     PRIMARY         PRIMARY 4       const   6       "Using where; Using index"

BTW, the InnoDB will automatically create one more index (on CITY2_ID) because of the second FK, which will also include the CITY1_ID and DISTANCE because secondary indexes in clustered tables must cover PK. You might be able to exploit that to avoid duplicated DISTANCEs (explicitly create index on {CITY2_ID, DISTANCE, CITY1_ID} and let FK reuse it, and CHECK (CITY1_ID < CITY2_ID)), but MySQL query optimizer is probably not smart enough to deal with the query that would be required on such a structure.

查看更多
登录 后发表回答