Recount or cache count of mysql table

2019-09-11 06:48发布

I want to display the number of reminders in queue as a statistic on my website. That statistic is the number of rows in my table. What I have been doing in the past is having a separate table that caches the number. The cached table has just one row and one column contained the number of rows in the reminder table. Every time a new reminder is added, I have another query that increments the value by one.

I have recently started incorporating the count() function into my project. My question is; is it better performance wise to perform a count() operation on the table to display the current number of reminders every time someone loads a page (the statistic is displayed in the footer) or is it faster to have that number already stored in another table so that I am just reading from that table? At what point would it be faster to use a cached value rather than the count function?

3条回答
时光不老,我们不散
2楼-- · 2019-09-11 06:55

If you are thinking of caching row counts, you probably shouldn't and you probably don't need it. There is a built in mechanism

SHOW TABLE STATUS

Part of the output of that query includes:

The number of rows. Some storage engines, such as MyISAM, store the exact count. For other storage engines, such as InnoDB, this value is an approximation, and may vary from the actual value by as much as 40 to 50%. In such cases, use SELECT COUNT(*) to obtain an accurate count.

The Rows value is NULL for tables in the INFORMATION_SCHEMA database.

This paragraph also answers your question about the efficiency of SELECT COUNT(*) - on MyISAM tables it's fast, it does not depend on the number of rows in the table because the internal counter is used.

How does innodb differ?

Innodb does not keep an internal count of rows in a table because concurrent transactions might “see” different numbers of rows at the same time. To process a SELECT COUNT(*) FROM t statement, InnoDB scans an index of the table, which takes some time if the index is not entirely in the buffer pool. If your table does not change often, using the MySQL query cache is a good solution. To get a fast count, you have to use a counter table you create yourself and let your application update it according to the inserts and deletes it does. If an approximate row count is sufficient, SHOW TABLE STATUS can be used.

Notice that this part of the documentation does speak of caching the count. But you will notice that if there is an index that covers the table completely the count(*) query is still fast. Since you naturally have a primary key and that primary key is likely to be in a buffer at least partially the performance impact will be neglible.

Note that the story is completely different in the other popular open source database Postgresql. There count(*) slows down proportionate to the table size. Thankfully in mysql it isn't so bad.

In conclusion: Since a cached row count is only approximate, you can just use show table status instead.

查看更多
别忘想泡老子
3楼-- · 2019-09-11 06:57

As with the most optimization related question the answer is: well, it depends.

If your table uses myisam table type, then the number of rows is already cached in the table itself and count(*) without where will read that number.

If you use innodb table engine and you have lots of inserts and less selects, then maintaining the count number will be more expensive than counting the rows. If you do not have too many inserts, then using a cached number will probably be faster, since innodb is relatively slow on count(*) without where.

See mysql performance blog on count(*) for more detailed explanation on innodb.

查看更多
叛逆
4楼-- · 2019-09-11 07:12

You are asking if is it better to use the count function ? , technically speaking that is the better way, but that is always depend on how large your data, calling count every page load is not a good idea if you are handling a large amount of data, because it will affect the rendering of the page, it will make the page load too slow in that case, If you are counting a small amount of data, then that is good and better way.

查看更多
登录 后发表回答