Building a pagination cursor

2019-04-09 05:22发布

问题:

I have activities that are stored in a graph database. Multiple activities are grouped and aggregated into 1 activity in some circumstances.

A processed activity feed could look like this:

Activity 1

Activity 2

Grouped Activity
  Activity 3
  Activity 4

Activity 5
  • Activities have an updated timestamp and a unique id.

  • The activities are ordered by their updated time and in the case of a grouped activity, the most recent updated time within its child activities is used.

  • Activities can be inserted anywhere in the list (for example, if we start following someone, their past activities would be inserted into the list).

  • Activities can be removed from anywhere in the list.

  • Due to the amount of data, using the timestamp with microseconds can still result in conflicts (2 items can have the same timestamp).

  • Cursor identifiers should be unique and stable. Adding and removing feed items should not change the identifier.

I would like to introduce cursor based paging to allow clients to paginate through the feed similar to twitter's. There doesn't seem to be much information on how they are built as I have only found this blog post talking about implementing them. However it seems to have a problem if the cursor's identifier happens to be pointing to the item that was removed.

With the above, how can I produce an identifier that can be used as a cursor for the above? Initially, I considered combining the timestamp with the unique id: 1371813798111111.myuniqueid. However, if the item at 1371813798111111.myuniqueid is deleted, I can get the items with the 1371813798111111 timestamp, but would not be able to determine which item with that timestamp I should start with.

Another approach I had was to assign an incrementing number to each feed result. Since the number is incrementing and in order, if the number/id is missing, I can just choose the next one. However, the problem with this is that the cursor ids will change if I start removing and adding feed items in the middle of the feed. One solution I had to this problem is to have a huge gap between each number, but it is difficult to determine how new items can be added to the space between each number in a deterministic way. In addition, as the new items are added, and the gaps are being filled up, we would end up with the same problem.

Simply put, if I have a list of items where items can be added and removed from anywhere in the list, what is the best way to generate an id for each list item such that if the item for the id is deleted, I can still determine its position in the list?

回答1:

You need to have additional (or existing) column which sequentially increased for every new added row to target table. Let's call this column seq_id.

When client request cursor for the first time:

GET /api/v1/items?sort_by={sortingFieldName}&size={count}

where sortingFieldName is name of field by which we apply sorting

What happened under the hood:

SELECT * FROM items
WHERE ...            // apply search params
ORDER BY sortingFieldName, seq_id
LIMIT :count

Response:

{
    "data": [...],
    "cursor": {
        "prev_field_name": "{result[0].sortingFieldName}",
        "prev_id": "{result[0].seq_id}",
        "nextFieldName": "{result[count-1].sortingFieldName}",
        "next_id": "{result[count-1].seq_id}",
        "prev_results_link": "/api/v1/items?size={count}&cursor=bw_{prevFieldName}_{prevId}",
        "next_results_link": "/api/v1/items?size={count}&cursor=fw_{nextFieldName}_{nextId}"       
    }
}

Next of cursor will not be present in response if we retrieved less than count rows.

Prev part of cursor will not be present in response if we don't have cursor in request or don't have data to return.

When client perform request again - he need to use cursor. Forward cursor:

GET /api/v1/items?size={count}&cursor=fw_{nextFieldName}_{nextId}

What happened under the hood:

SELECT * FROM items
WHERE ...            // apply search params
AND ((fieldName = :cursor.nextFieldName AND seq_id > :cursor.nextId) OR 
      fieldName > :cursor.nextFieldName)
ORDER BY sortingFieldName, seq_id
LIMIT :count

Or backward cursor:

GET /api/v1/items?size={count}&cursor=fw_{prevFieldName}_{prevId}

What happened under the hood:

SELECT * FROM items
WHERE ...            // apply search params
AND ((fieldName = :cursor.prevFieldName AND seq_id < :cursor.prevId) OR 
      fieldName < :cursor.prevFieldName)
ORDER BY sortingFieldName DESC, seq_id DESC
LIMIT :count

Response will be similar to previous one