I have following simple relationship between (:User)
nodes.
(:User)-[:FOLLOWS {timestamp}]->(:User)
If I paginate followers ordered by FOLLOWS.timestamp
I'm running into performance problems when someone has millions of followers.
MATCH (u:User {Id:{id}})<-[f:FOLLOWS]-(follower)
WHERE f.timestamp <= {timestamp}
RETURN follower
ORDER BY f.timestamp DESC
LIMIT 100
What is suggested approach for paginating big sets of data when ordering is required?
UPDATE
follower timestamp
---------------------------------------
id(1000000) 1455967905
id(999999) 1455967875
id(999998) 1455967234
id(999997) 1455967123
id(999996) 1455965321
id(999995) 1455964123
id(999994) 1455963645
id(999993) 1455963512
id(999992) 1455961343
....
id(2) 1455909382
id(1) 1455908432
I want to slice this list down using timestamp which set on :FOLLOWS
relationship. If I want to return batches of 4 followers I take current timestamp first and return 4 most recent, then 1455967123 and 4 most recent and so on. In order to do this the whole list should be order by timestamp which results in performance issues on millions of records.
If you're looking for the most recent followers, i.e. where the timestamp is greater than a given time, it only has to traverse the most recent ones.
You can do it with (2) in 20ms
If you are really looking for the oldest (first) followers it makes sense to skip ahead and don't look at the timestamp of every of the million followers (which takes about 1s on my system, see (3)). If you skip ahead the time goes down to 230ms, see (1)
In general we can see that on my laptop it does 2M db-operations per core and second.
(1) Look at first / oldest followers
(2) Look at most recent followers
(3) Find oldest followers without skipping ahead