How to increase performance of shortest path using

I'm using JanusGraph with Gremlin and this dataset cotaining 2.6k nodes and 6.6k edges (3.3k edges on both sides). I've run the query for 10 minutes without find the shortest path.

Using Gephi the shortest path is almost instantaneous.

Here's my query:

g.V(687).repeat(out().simplePath()).until(hasId(1343)).path().limit(1)

标签： graph cassandra shortest-path gremlin janusgraph

1条回答

不美不萌又怎样

2楼-- · 2019-05-30 18:39

With simplePath() your query still processes a lot more paths than necessary. For example, if 688 is a direct neighbor of 687, but also a neighbor of 1000, which is 10 hops away on another path, why would you want to follow the path from 1000 to 688, if you've already seen this crossroad much earlier?

So, you should filter out any crossroad that you've seen before (the first occurrence is always the closest):

g.V(687).store('x').
  repeat(out().where(without('x')).aggregate('x')).
   until(hasId(1343)).limit(1).path()

Also note, that I swapped limit(1) and path; that's because it's a waste of resources (CPU and memory) to gather all paths first and then just take the first one.

UPDATE:

If others want to give it a try, here's the code to load the dataset into TinkerGraph:

g = TinkerGraph.open().traversal()
"http://nrvis.com/download/data/road/road-minnesota.zip".toURL().withInputStream {
  new java.util.zip.ZipInputStream(it).with {
    while (entry = it.getNextEntry()) {
      if ("road-minnesota.mtx" == entry.getName()) {
        it.eachLine {
          if (it ==~ /[0-9]+ [0-9]+/) {
            def (a, b) = it.split()*.toInteger()
            g.V(a).fold().
              coalesce(unfold(), addV().property(id, a)).
              addE("road").
                to(V(b).fold().coalesce(unfold(), addV().property(id, b))).inV().
              addE("road").to(V(a)).iterate()
          }
        }
        break
      }
      it.closeEntry()
    }
  }
}

And the query and a little benchmark:

gremlin> g.V(687).store('x').
......1>   repeat(out().where(without('x')).aggregate('x')).
......2>    until(hasId(1343)).limit(1).
......3>   path().by(id)
==>[687,689,686,677,676,675,673,626,610,606,607,608,735,732,733,730,729,734,737,738,739,742,786,816,840,829,815,825,865,895,872,874,968,983,1009,1044,1140,1142,1148,1219,1255,1329,1337,1339,1348,1343]

gremlin> clock (100) {
......1>   g.V(687).store('x').
......2>     repeat(out().where(without('x')).aggregate('x')).
......3>      until(hasId(1343)).limit(1).
......4>     path().iterate()
......5> }
==>12.5362714

12.5 ms on TinkerGraph looks pretty good to me. Expect it to run a little bit longer on JG, but surely not more than 10 minutes.

0人赞添加讨论(0) 举报

How to increase performance of shortest path using

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间