How to store public-transportation data

2020-05-24 11:42发布

问题:

I am currently trying to implement my own public transportation path-finder to find connections by tram/bus etc. with given timetables. All the data is generated by me (by simply adding stops coordinates from google map). Thanks to it, I am free to choose my own way of storing data and processing them. The whole transportation network is represented by a weighted graph. So here comes the question: how to store public transportation data in standard SQL database so that it could be easily processed by some chosen algorithms? How to easily convert it later to the form of time-expanded graph so that simple Dijkstra algorithm would suffice?

回答1:

Since I wrote a bachelor thesis about "how far can you get in X minutes using only public transport", I can share some insight on your problem.

The Problem

First and foremost, forget about traditional algorithms. Go for time-aware ones. What works for regular road networks, totally fails on time restricted graphs. Time awareness is one problem, but there are many more which you would never think about as usual passenger

  1. some trains are guaranteed to wait for other trains
  2. you have some minimum downtime when switching trains (from train a to b)
  3. the downtime depends on the station (big or small stations) and the platform (switching from platform 1 to 2 is faster than 1 to 20)
  4. train schedules differ depending on the date, your data table has a lot of entries which do not apply to your selected date.

Solutions

Judging by your nickname I assume you are able to read german. You can read my analysis of algorithms and my database setup in my thesis document. Page 49 shows the database model and page 50 the inmemory model. Also take a look at the references on page 55-57 (they are mostly english).

Even time-aware-dijkstra is really slow. A good algorithm is RAPTOR (scientific description with example can be found here). RAPTOR makes use of the timetable patterns instead of being hindered by it.

How RAPTOR works: Timetables mainly consist of stations (location), rides (one run of a single train), stops (combo of ride,time,location). The trick of raptor is to organize all rides into routes. A route may only contain rides which have the same sequence of stops and do not overtake each other. This way you can take the first ride of a route which matches your time and omit checking all the other rides on the same route (because they are guaranteed to arrive later). The amount of routes is considerably smaller (factor 1000 in my data) than the amount of rides. In addition, the RAPTOR algorithm runs in "train-hopping-cycles", which enables it to check a single station exactly once (dijkstra cannot) and to follow

It goes like this:

reachableStations = (startStation,timeX)
for i=0; i < maxHops; i++ {
     if( reachableStations contains targetStation)
         finished!
     usableRoutes       = collect all routes leaving from reachableStations 
     reachableStations  = follow all usableRoutes to the end 
                          and collect stations for the next cycle.
                          keep station-time combos for each find.
}   

.

For my application I used a modified version of RAPTOR, which I named REAS. It's optimized for getting data about the complete network (instead of looking up a single route). You should simply stick to RAPTOR. One of the key learnings about algorithms in public transport is that the problem is a lot harder than it looks.

My app can be seen here. It uses the HAFAS data of the swiss railway company (SBB & ZVV) but currently the data is only from 2014. The calculation itself is fast, what takes a lot of time is generating the graphical overlay.

If you have more questions, feel free to contact me directly (contact info is available on ttm.ti8m.ch)