I'm using JanusGraph with Gremlin and this dataset cotaining 2.6k nodes and 6.6k edges (3.3k edges on both sides). I've run the query for 10 minutes without find the shortest path.
Using Gephi the shortest path is almost instantaneous.
Here's my query:
g.V(687).repeat(out().simplePath()).until(hasId(1343)).path().limit(1)
With
simplePath()
your query still processes a lot more paths than necessary. For example, if688
is a direct neighbor of687
, but also a neighbor of1000
, which is 10 hops away on another path, why would you want to follow the path from1000
to688
, if you've already seen this crossroad much earlier?So, you should filter out any crossroad that you've seen before (the first occurrence is always the closest):
Also note, that I swapped
limit(1)
andpath
; that's because it's a waste of resources (CPU and memory) to gather all paths first and then just take the first one.UPDATE:
If others want to give it a try, here's the code to load the dataset into TinkerGraph:
And the query and a little benchmark:
12.5 ms on TinkerGraph looks pretty good to me. Expect it to run a little bit longer on JG, but surely not more than 10 minutes.