Paginating very large datasets

2019-03-22 03:08发布

问题:

I have a dataset in MySQL where using limit is already an expensive query, and finding the number of results is expensive as well. Thus, I'd like to avoid doing another query to find the number of results. I cannot use MYSQL_CALC_FOUND_ROWS because the limit is inside a subquery:

SELECT * FROM items,
(
  SELECT
    item_id
  FROM
    (etc)
  WHERE
    some.field=<parameter>
  AND (etc)
  GROUP BY (something)
  ORDER BY (something_else) DESC
  LIMIT 15
) subset
WHERE item.id=subset.item_id

I could left join items and do away with the subquery, then be able to use MYSQL_CALC_FOUND_ROWS, however this is very, very, slow. I've tried all index optimizations and let's just assume it is out of the question.

This now becomes more a design question... how do I allow the user to page through this data when I don't know the last page? I only know if they've gone too far (eg: query returns no results).

回答1:

Here's a summary of an article by MySQL guru Baron Schwartz:

http://www.mysqlperformanceblog.com/2008/09/24/four-ways-to-optimize-paginated-displays/

Four Ways to Optimize Paginated Displays

  1. On the first query, fetch and cache all the results.

  2. Don't show all results. Not even Google lets you see the millionth result.

  3. Don't show the total count or the intermediate links to other pages. Show only the "next" link.

  4. Estimate how many results there are. Again, Google does this and nobody complains.



回答2:

To reinforce that, the concept of "how many" is extremely transitory anyway; by the time the answer gets to the user, it could easily be wrong.