Neo4j/Cypher effective pagination with order by ov

2019-06-24 08:13发布

I have following simple relationship between (:User) nodes.

(:User)-[:FOLLOWS {timestamp}]->(:User)

If I paginate followers ordered by FOLLOWS.timestamp I'm running into performance problems when someone has millions of followers.

MATCH (u:User {Id:{id}})<-[f:FOLLOWS]-(follower)
WHERE f.timestamp <= {timestamp}
RETURN follower
ORDER BY f.timestamp DESC
LIMIT 100

What is suggested approach for paginating big sets of data when ordering is required?

UPDATE

follower             timestamp
---------------------------------------
id(1000000)          1455967905
id(999999)           1455967875
id(999998)           1455967234
id(999997)           1455967123
id(999996)           1455965321
id(999995)           1455964123
id(999994)           1455963645
id(999993)           1455963512
id(999992)           1455961343
....
id(2)                1455909382
id(1)                1455908432

I want to slice this list down using timestamp which set on :FOLLOWS relationship. If I want to return batches of 4 followers I take current timestamp first and return 4 most recent, then 1455967123 and 4 most recent and so on. In order to do this the whole list should be order by timestamp which results in performance issues on millions of records.

1条回答
Juvenile、少年°
2楼-- · 2019-06-24 08:39

If you're looking for the most recent followers, i.e. where the timestamp is greater than a given time, it only has to traverse the most recent ones.

You can do it with (2) in 20ms

If you are really looking for the oldest (first) followers it makes sense to skip ahead and don't look at the timestamp of every of the million followers (which takes about 1s on my system, see (3)). If you skip ahead the time goes down to 230ms, see (1)

In general we can see that on my laptop it does 2M db-operations per core and second.

(1) Look at first / oldest followers

PROFILE
> MATCH (u)<-[f:FOLLOWS]-(follower) WHERE id(u) = 0
> // skip ahead
> WITH f,follower SKIP 999000
> // do the actual check
> WITH f,follower WHERE f.ts < 500
> RETURN f, follower
> ORDER BY f.ts
> LIMIT 10;
+---------------------------------+
| f                  | follower   |
+---------------------------------+
| :FOLLOWS[0]{ts:1}  | Node[1]{}  |
...
+---------------------------------+
10 rows
243 ms

Compiler CYPHER 2.3 Planner COST Runtime INTERPRETED

+-----------------+----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| Operator        | Estimated Rows | Rows    | DB Hits | Identifiers                                                             | Other                                 |
+-----------------+----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +ProduceResults |              1 |      10 |       0 | f, follower                                                             | f, follower                           |
| |               +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Projection     |              1 |      10 |       0 | anon[142], anon[155], anon[158], anon[178], f, follower, f, follower, u | anon[155]; anon[158]                  |
| |               +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Top            |              1 |      10 |       0 | anon[142], anon[155], anon[158], anon[178], f, follower, u              | Literal(10);                          |
| |               +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Projection     |              1 |     499 |     499 | anon[142], anon[155], anon[158], anon[178], f, follower, u              | anon[155]; anon[158]; anon[155].ts    |
| |               +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Projection     |              1 |     499 |       0 | anon[142], anon[155], anon[158], f, follower, u                         | f; follower                           |
| |               +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Filter         |              1 |     499 |       0 | anon[142], f, follower, u                                               | anon[142]                             |
| |               +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Projection     |              1 |    1000 |    1000 | anon[142], f, follower, u                                               | f; follower; f.ts < {  AUTOINT2}      |
| |               +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Skip           |              1 |    1000 |       0 | f, follower, u                                                          | {  AUTOINT1}                          |
| |               +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Expand(All)    |              1 | 1000000 | 1000001 | f, follower, u                                                          | (u)<-[  f@12:FOLLOWS]-(  follower@24) |
| |               +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +NodeByIdSeek   |              1 |       1 |       1 | u                                                                       |                                       |
+-----------------+----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+

Total database accesses: 1001501

(2) Look at most recent followers

PROFILE
> MATCH (u)<-[f:FOLLOWS]-(follower) WHERE id(u) = 0
> AND f.ts > 999500
> RETURN f, follower
> LIMIT 10;
+----------------------------------------------+
| f                           | follower       |
+----------------------------------------------+
| :FOLLOWS[999839]{ts:999840} | Node[999840]{} |
...
+----------------------------------------------+
10 rows
23 ms

Compiler CYPHER 2.3 Planner COST Runtime INTERPRETED

+-----------------+----------------+-------+---------+----------------+---------------------------------------------------------------+
| Operator        | Estimated Rows | Rows  | DB Hits | Identifiers    | Other                                                         |
+-----------------+----------------+-------+---------+----------------+---------------------------------------------------------------+
| +ProduceResults |              1 |    10 |       0 | f, follower    | f, follower                                                   |
| |               +----------------+-------+---------+----------------+---------------------------------------------------------------+
| +Limit          |              1 |    10 |       0 | f, follower, u | Literal(10)                                                   |
| |               +----------------+-------+---------+----------------+---------------------------------------------------------------+
| +Filter         |              1 |    10 |   16394 | f, follower, u | AndedPropertyComparablePredicates(f,f.ts,f.ts > {  AUTOINT1}) |
| |               +----------------+-------+---------+----------------+---------------------------------------------------------------+
| +Expand(All)    |              1 | 16394 |   16395 | f, follower, u | (u)<-[f:FOLLOWS]-(follower)                                   |
| |               +----------------+-------+---------+----------------+---------------------------------------------------------------+
| +NodeByIdSeek   |              1 |     1 |       1 | u              |                                                               |
+-----------------+----------------+-------+---------+----------------+---------------------------------------------------------------+

Total database accesses: 32790

(3) Find oldest followers without skipping ahead

PROFILE
> MATCH (u)<-[f:FOLLOWS]-(follower) WHERE id(u) = 0
> AND f.ts < 500
> RETURN f, follower
> LIMIT 10;
+-------------------------------------+
| f                     | follower    |
+-------------------------------------+
...
| :FOLLOWS[491]{ts:492} | Node[492]{} |
+-------------------------------------+
10 rows
1008 ms

Compiler CYPHER 2.3 Planner COST Runtime INTERPRETED

+-----------------+----------------+--------+---------+----------------+---------------------------------------------------------------+
| Operator        | Estimated Rows | Rows   | DB Hits | Identifiers    | Other                                                         |
+-----------------+----------------+--------+---------+----------------+---------------------------------------------------------------+
| +ProduceResults |              1 |     10 |       0 | f, follower    | f, follower                                                   |
| |               +----------------+--------+---------+----------------+---------------------------------------------------------------+
| +Limit          |              1 |     10 |       0 | f, follower, u | Literal(10)                                                   |
| |               +----------------+--------+---------+----------------+---------------------------------------------------------------+
| +Filter         |              1 |     10 |  999498 | f, follower, u | AndedPropertyComparablePredicates(f,f.ts,f.ts < {  AUTOINT1}) |
| |               +----------------+--------+---------+----------------+---------------------------------------------------------------+
| +Expand(All)    |              1 | 999498 |  999499 | f, follower, u | (u)<-[f:FOLLOWS]-(follower)                                   |
| |               +----------------+--------+---------+----------------+---------------------------------------------------------------+
| +NodeByIdSeek   |              1 |      1 |       1 | u              |                                                               |
+-----------------+----------------+--------+---------+----------------+---------------------------------------------------------------+

Total database accesses: 1998998
查看更多
登录 后发表回答