How to increase performance of shortest path using

2019-05-30 18:02发布

问题:

I'm using JanusGraph with Gremlin and this dataset cotaining 2.6k nodes and 6.6k edges (3.3k edges on both sides). I've run the query for 10 minutes without find the shortest path.

Using Gephi the shortest path is almost instantaneous.

Here's my query:

g.V(687).repeat(out().simplePath()).until(hasId(1343)).path().limit(1)

回答1:

With simplePath() your query still processes a lot more paths than necessary. For example, if 688 is a direct neighbor of 687, but also a neighbor of 1000, which is 10 hops away on another path, why would you want to follow the path from 1000 to 688, if you've already seen this crossroad much earlier?

So, you should filter out any crossroad that you've seen before (the first occurrence is always the closest):

g.V(687).store('x').
  repeat(out().where(without('x')).aggregate('x')).
   until(hasId(1343)).limit(1).path()

Also note, that I swapped limit(1) and path; that's because it's a waste of resources (CPU and memory) to gather all paths first and then just take the first one.

UPDATE:

If others want to give it a try, here's the code to load the dataset into TinkerGraph:

g = TinkerGraph.open().traversal()
"http://nrvis.com/download/data/road/road-minnesota.zip".toURL().withInputStream {
  new java.util.zip.ZipInputStream(it).with {
    while (entry = it.getNextEntry()) {
      if ("road-minnesota.mtx" == entry.getName()) {
        it.eachLine {
          if (it ==~ /[0-9]+ [0-9]+/) {
            def (a, b) = it.split()*.toInteger()
            g.V(a).fold().
              coalesce(unfold(), addV().property(id, a)).
              addE("road").
                to(V(b).fold().coalesce(unfold(), addV().property(id, b))).inV().
              addE("road").to(V(a)).iterate()
          }
        }
        break
      }
      it.closeEntry()
    }
  }
}

And the query and a little benchmark:

gremlin> g.V(687).store('x').
......1>   repeat(out().where(without('x')).aggregate('x')).
......2>    until(hasId(1343)).limit(1).
......3>   path().by(id)
==>[687,689,686,677,676,675,673,626,610,606,607,608,735,732,733,730,729,734,737,738,739,742,786,816,840,829,815,825,865,895,872,874,968,983,1009,1044,1140,1142,1148,1219,1255,1329,1337,1339,1348,1343]

gremlin> clock (100) {
......1>   g.V(687).store('x').
......2>     repeat(out().where(without('x')).aggregate('x')).
......3>      until(hasId(1343)).limit(1).
......4>     path().iterate()
......5> }
==>12.5362714

12.5 ms on TinkerGraph looks pretty good to me. Expect it to run a little bit longer on JG, but surely not more than 10 minutes.