Gremlin: What's an efficient way of finding an

2019-02-19 03:51发布

So obviously, a straight forward way to find an edge between two vertices is to:

graph.traversal().V(outVertex).bothE(edgeLabel).filter(__.otherV().is(inVertex))

I feel that filter step will have to iterate through all edges making really slow for some applications with a lot of edges.

Another way could be:

traversal = graph.traversal()
                  .V(outVertex)
                  .bothE(edgeLabel)
                  .as("x")
                  .otherV()
                  .is(outVertex) // uses index?
                  .select("x");

I'm assuming the second approach could be much faster since it will be using ID index which will make it faster than the first the approach.

Which one is faster and more efficient (in terms of IO)?

I'm using Titan, so you could also make your answer Titan specific.

Edit

In terms of time, seems like the first approach is faster (edges were 20k for vertex b

gremlin> clock(100000){g.V(b).bothE().filter(otherV().is(a))}
==>0.0016451789999999999
gremlin> clock(100000){g.V(b).bothE().as("x").otherV().is(a).select("x")}
==>0.0018231140399999999

How about IO?

2条回答
The star\"
2楼-- · 2019-02-19 04:14

In case one does not know the vertex Id's, another solution might be

g.V().has('propertykey','value1').outE('thatlabel').as('e').inV().has('propertykey','value2').select('e')

This is also only unidirectional so one needs to reformulate the query for the opposite direction.

查看更多
贪生不怕死
3楼-- · 2019-02-19 04:18

I would expect the first query to be faster. However, few things:

  1. None of the queries is optimal, since both of them enable path calculations. If you need to find a connection in both directions, then use 2 queries (I will give an example below)
  2. When you use clock(), be sure to iterate() your traversals, otherwise you'll only measure how long it takes to do nothing.

These are the queries I would use to find an edge in both directions:

g.V(a).outE(edgeLabel).filter(inV().is(b))
g.V(b).outE(edgeLabel).filter(inV().is(a))

If you expect to get at most one edge:

edge = g.V(a).outE(edgeLabel).filter(inV().is(b)).tryNext().orElseGet {
       g.V(b).outE(edgeLabel).filter(inV().is(a)).tryNext()
}

This way you get rid of path calculations. How those queries perform will pretty much depend on the underlying graph database. Titan's query optimizer recognizes that query pattern and should return a result in almost no time.

Now if you want to measure the runtime, do this:

clock(100) {
  g.V(a).outE(edgeLabel).filter(inV().is(b)).iterate()
  g.V(b).outE(edgeLabel).filter(inV().is(a)).iterate()
}
查看更多
登录 后发表回答