Search queries in neo4j: how to sort results in ne

2020-02-29 16:11发布

问题:

I am working on a model using wikipedia topics' names for my experiments in full-text index.

I set up and index on 'topic' (legacy), and do a full text search for : 'united states':

start n=node:topic('name:(united states)') return n

The first results are not relevant at all:

'List of United States National Historic Landmarks in United States commonwealths and territories, associated states, and foreign states'

[...]

and the actual 'united states' is buried deep down the list.

As such, it raises the problem that, in order to find the best match (e.g. levershtein, bi-gram, and so on algorithms) on results, you first must fetch all the items matching the pattern.

That would be a serious constraint, cause just in this case I have 21K rows, ~4 seconds.

Which algorithms does neo4j use to order the results of a full-text search (START)? Which rationale does it use to sort result and how to change it using cypher? In the doc is written to use JAVA api to apply sort() - it would be very useful to have a tutorial for appointing to which files to modify and also to know which ranking rationale is used before any tweak.

EDITED based on comments below - pagination of results is possible as: n=node:topic('name:(united states)') return n skip 10 limit 50;

(skip before limit) but I need to ensure first results are meaningful before pagination.

回答1:

I don't know which order algorithms does lucene use to order the results. However, about the pagination, if you change the order of limit and skip like follows, should be ok. start n=node:topic('name:(united states)') return n skip 10 limit 50 ;

I would also add that if you are performing full-text search maybe a solution like solr is more appropriate.



回答2:

For just a lucene index lookup with scoring you might be better off with this:

http://neo4j.com/docs/stable/rest-api-indexes.html#rest-api-find-node-by-query