Limiting Cypher queries

2019-09-04 11:31发布

问题:

I am currently using a neo4j database with 50000 nodes and 2 million relationships to perform cypher MATCH queries, like the one below:

start startnode = node(42660), endnode = node(30561)
match startnode-[r*1..3]->endnode
return r;

This query by itself provides 443 rows, but I only want Cypher to find 5 matches and return those only. Allow me to clarify: I do not just want Cypher to return only 5 results, I also want cypher to STOP querying once it finds 5 results. I DO NOT want Cypher to get all 443 results.

Is this currently possible using the LIMIT clause? Or would the LIMIT wait for all 443 results to be found, then only return the first 5?

EDIT: Will the LIMIT clause find only the first few results for a complex query like this?

start graphnode = node(1), startnode = node(42660), endnode = node(30561)
match startnode<-[:CONTAINS]-graphnode-[:CONTAINS]->endnode
with startnode, endnode
match startnode-[r1*1..1]->endnode
with r1, startnode, endnode
limit 30
match startnode-[r2*2..2]->endnode
with r1, r2, startnode, endnode
limit 30
match startnode-[r3*3..3]->endnode
with r1, r2, r3, startnode, endnode
limit 30
return r1,r2,r3;

Here is the profile for the query:

==> ColumnFilter(symKeys=["  UNNAMED216", "endnode", "r1", "startnode", "r2", "r3"],   returnItemNames=["r1", "r2", "r3"], _rows=30, _db_hits=0)
==> Slice(limit="Literal(30)", _rows=30, _db_hits=0)
==>   PatternMatch(g="(startnode)-['  UNNAMED216']-(endnode)", _rows=30, _db_hits=0)
==>     ColumnFilter(symKeys=["endnode", "  UNNAMED140", "r1", "startnode", "r2"], returnItemNames=["r1", "r2", "startnode", "endnode"], _rows=1, _db_hits=0)
==>       Slice(limit="Literal(30)", _rows=1, _db_hits=0)
==>         PatternMatch(g="(startnode)-['  UNNAMED140']-(endnode)", _rows=1, _db_hits=0)
==>           ColumnFilter(symKeys=["startnode", "endnode", "  UNNAMED68", "r1"], returnItemNames=["r1", "startnode", "endnode"], _rows=1, _db_hits=0)
==>             Slice(limit="Literal(30)", _rows=1, _db_hits=0)
==>               PatternMatch(g="(startnode)-['  UNNAMED68']-(endnode)", _rows=1, _db_hits=0)
==>                 NodeById(name="Literal(List(30561))", identifier="endnode", _rows=1, _db_hits=1)
==>                   NodeById(name="Literal(List(42660))", identifier="startnode", _rows=1, _db_hits=1)

回答1:

It depends on what you're doing, but in this case, if you were to add limit 5 after return, it would be able to lazily return and skip the rest of the matches. If you were to want to sort, or aggregate, it wouldn't be able to do that for you. If you find this to not be the behavior, please report it as an issue on github (along with the version you're using, etc.)

update for new query

start graphnode = node(1), startnode = node(42660), endnode = node(30561)
match startnode<-[:CONTAINS]-graphnode-[:CONTAINS]->endnode // do you need this, or is it always going to be true?
with startnode, endnode                                     // ditto. take it out if it doesn't need to be here.
match startnode-[r1*1..1]->endnode // this can probably be simplified to just startnode-[r1]->endnode
with r1, startnode, endnode 
limit 30 // limit to the first 30 it finds in the previous match (this should be lazy)
match startnode-[r2*2..2]->endnode // finds 2 levels deep
with r1, r2, startnode, endnode
limit 30 // limit to the first 30 it finds in the previous match (this should be lazy)
match startnode-[r3*3..3]->endnode
return r1,r2,r3 // the last with you had was extraneous, return will function the same way
limit 30; 

So, I assume you're asking a question because this query is slow. Might I ask why you're breaking it up this way, instead of just startnode-[r*1..3]->endnode, and limit 30? Do you really need the first match/with, or is that check unnecessary? Can you provide the output of PROFILE?