I need do display a distance matrix on my web-page for all the nearby locations for a city.
I would like to fetch all this data from web-service and save in my DB in advance. I am trying to figure out the best relational DB design to save such a data.
I want to avoid redundant data and also a design which gives optimal performance.
I know relation DB is not the best option for this but that is something I can not help at this point.
Question: So what is the best DB schema design to store such info. I would need to query DB providing just one city and I would have to display a matrix of 5 or 10 closest cities.
Travel time is not that important, I am concerned about distance mainly.
The easiest way would be to store a pair of cities along with the distance and any other data you want to be able to display. I'd store the cities themselves in a separate table, and only store two keys and the distance information in a distance table.
I you're sure you only want to display the 5 or 10 closest at most, you can start with only adding those records. That means for N cities you will only get N*10 records in the database which should be quite scalable.
Even with larger number of records, the performance should be good if you add proper indexes.
For the sake of performance, and assuming you are using InnoDB, I'd probably denormalize the data a bit, like this:
Each pair of cities has 2 rows in CITY_DISTANCE containing the same DISTANCE (one for each direction). This could obviously make it very big and could lead to data inconsistencies (the database will not defend itself from non-matching DISTANCE values between same cities), and the DISTANCE doesn't logically belong to the PK, but bear with me...
InnoDB tables are clustered, which means that by declaring the PK in this particular way we put the whole table in a B-Tree that is particularly suited for a query like this:
This query returns the closest 5 cities to the city identified by
1
, and can be satisfied by a simple range scan on the B-Tree mentioned above:BTW, the InnoDB will automatically create one more index (on CITY2_ID) because of the second FK, which will also include the CITY1_ID and DISTANCE because secondary indexes in clustered tables must cover PK. You might be able to exploit that to avoid duplicated DISTANCEs (explicitly create index on {CITY2_ID, DISTANCE, CITY1_ID} and let FK reuse it, and CHECK (CITY1_ID < CITY2_ID)), but MySQL query optimizer is probably not smart enough to deal with the query that would be required on such a structure.