Index does not work when using order().by() in Tit

2019-02-26 08:44发布

问题:

The Titan documentation says that:

Mixed indexes support ordering natively and efficiently. However, the property key used in the order().by() method must have been previously added to the mixed indexed for native result ordering support. This is important in cases where the the order().by() key is different from the query keys. If the property key is not part of the index, then sorting requires loading all results into memory.

So, I made a mixed index on prop1 property. The mixed index on prop1 works well when value is specified.

gremlin> g.V().has('prop1', gt(1)) /* this gremlin uses the mixed index */
==>v[6017120]
==>v[4907104]
==>v[8667232]
==>v[3854400]
...

But, When I use order().by() on prop1 I cannot take advantage of the mixed index.

gremlin> g.V().order().by('prop1', incr) /* doesn't use the mixed index */
17:46:00 WARN  com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx  - Query requires iterating over all vertices [()]. For better performance, use indexes
Could not execute query since pre-sorting requires fetching more than 1000000 elements. Consider rewriting the query to exploit sort orders

Also count() takes so long time.

gremlin> g.V().has('prop1').count()
17:44:47 WARN  com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx  - Query requires iterating over all vertices [()]. For better performance, use indexes

I'd be happy if I know what's wrong with me. Here are my Titan information:

  • Titan Version: 1.0.0-hadoop1
  • Storage Backend: Cassandra 2.1.1
  • Index Backend: ElasticSearch 1.7

Thank you.

回答1:

You must supply a value to filter on for the indices to be used. Here:

g.V().order().by('prop1', incr)

you don't provide any filter, so Titan has to iterate all of V() and then applies the sort.

Here:

g.V().has('prop1').count()

you supply a indexed key but don't specify a value to filter on so it's still iterating all of V(). You could do:

g.V().has("prop1", textRegex(".*")).count()

In this case, you would fake Titan out a bit, but the query still could be slow anyway if that query returns a lot of results to iterate over.