Caching strategy, when does caching become pointle

2019-03-15 04:54发布

I'm pretty new to caching strategies and implementations. I'm working on a project that will be database intensive, but also have information being updated and changed very regularly.

I've found enough info to know generally how to develop the caching function, but what I'm unsure about is the general strategy.

If I cache all query results and group them by logical things that I can clear on triggers that make sense, I'll probably have tens of thousands (at least) tiny files in my cache. Would it make more sense to cache only large query results?

I know that this is a somewhat hardware specific question, but generally speaking at what volume of files does caching become somewhat pointless? Meaning, if you're loading up the file system with all of these tiny files, does access to them eventually become slow enough that you might as well have just not cached the information to start with?

Thanks all, I'm interested in any opinions you have to offer

EDIT: Based on the responses regarding this being absolutely application specific, let me pose the question this way which should be universal:

Assuming that I have an application that depends on one table with 1,000,000 items in it...

Would it be quicker to do a query to retrieve one of those items directly from the database, or to retrieve one of those items from my cache directory with 1,000,000 files, each containing the details of one of those items?

EDIT: Apparently 100,000 wasn't enough to get a valid answer, let's make it 1,000,000. Anyone want to go for 1,000,000,000? Because I can do it...

3条回答
一纸荒年 Trace。
2楼-- · 2019-03-15 05:20

Use MySQL's built in query cache instead of trying to maintain it yourself. It will automatically clear cached queries to tables when they are written to. Plus, it works in memory so it should be very efficient...

Also, don't just cache queries. Try to cache entire segments of the application at different stages in the rendering cycle. So you can let MySQL cache the queries, then you cache each individual view (rendered), each individual block, and each page. Then, you can choose whether or not to pull from cache based upon the request.

For example, a non-logged-in user may get the full page directly from cache. But a logged-in user may not be able to (due to username, etc). So for him, you may be able to render 1/2 your views on the page from cache (since they don't depend on the user object). You still get the benefit of caching, but it'll be tiered based upon need.

If you're really expecting a lot of traffic, it's definitely worth looking into Memcached. Let MySQL store your queries for you, and then store all user-land cache items in memcache...

Edit: To answer your edit:

Filesystems can become slow if a single directory grows big. As long as you're "namespacing" by directory (so each directory only has a small portion of cache files), you should be fine from that standpoint. As for the exact threshold, it really will depend on your hardware and filesystem more than anything else. I know EXT3 gets quite slow if there are a load of files in a single directory (I have directories with literally hundreds of thousands of files, and it can take up to half a second to simply stat() one of the files, let alone do any kind of directory listing)...

But realize that if you add another server, you're going to either have duplication of cache (which is not a good thing), or are going to have to rewrite your entire cache layer. Is there a reason not to go with Memcached right from the start?

EDit 2: To answer your latest edit:

It's still too tough to call. I have an application that has a database with around 1.5 billion rows (growing at around 500k per day). We don't use any caching on it at all because we don't have concurrency issues. And even if we did, we'd be better off throwing more MySQL servers at it rather than adding caching since any form of cache would have such a low hit rate that it wouldn't be worth the development time to add it.

And that's the reason I am so adamant about not caching for speed. There will always be an object that is not in cache. So if you hit a page with one of those objects, it still needs to be fast. As a rule of thumb, I try to cache anything that will be accessed again in the next few minutes (I keep a time to live of about 5 minutes in production on other applications anyway). So if items aren't getting more than a few hits in that time span, or the hit rate is very low (less than 90%), I don't bother caching that item....

查看更多
我想做一个坏孩纸
3楼-- · 2019-03-15 05:21

This is both hardware- and application-dependent. You need to perform benchmarks to determine the threshold at which OS indexing becomes greater than the data storage/retrieval duration (both at the MySQL level and cached file access level). And you also need to compare that against the acceptable (very subjective) threshold of your audience.

查看更多
来,给爷笑一个
4楼-- · 2019-03-15 05:34

The general rule is: do not cache while it is not necessary and cache only the things that need to be cached.

查看更多
登录 后发表回答