What does it mean by cold cache and warm cache con

2019-01-21 18:31发布

问题:

I read a paper and it used terms cold cache and warm cache. I googled about this terms but I didn't find something useful (only a thread here).

What do these terms mean?

回答1:

TL;DR There is analogy with cold engine and warm engine of the car. Cold cache - doesn't have any values and can't give you any speedup because, well, it's empty. Warm cache have some values and can give you that speedup.

Cache is a structure that holds some values (inodes, memory pages, disk blocks, etc.) for faster lookup.

Cache works by storing some kind of short references in fast search data structure (hash table, B+ Tree) or faster access media (RAM memory vs HDD, SSD vs HDD).

To be able to do this fast search you need your cache to hold values. Let's look at example.

Say, you have a Linux system with some filesystem. To access files in filesystem you need to know where your file starts at disk. This information stored in inode. For simplicity we say that inode table is stored somewhere on disk (so called "superblock" part).

Now imagine, that you need to read file /etc/fstab. To do this you need to read inode table from disk (10 ms) then parse it and get start block of file and then read file itself(10ms). Total ~20ms

This is way too many operations. So you are adding a cache in form of hash table in RAM. RAM access is 10ns - that's 1000(!) times faster. Each row in that hash table holds 2 values.

(inode number or filename) : (starting disk block)

But the problem is that at the start your cache is empty - such cache is called cold cache. To exploit benefits of your cache you need to fill it with some values. How does it happen? When you're looking for some file you look in your inode cache. If you don't find inode in cache (cache miss) you're saying 'Okay' and do full read cycle with inode table reading, parsing it and reading file itself. But after parsing part you're saving inode number and parsed starting disk block in your cache. And that's going on and on - you try to read another file, you look in cache, you get cache miss (your cache is cold), you read from disk, you add row in cache.

So cold cache doesn't give you any speedup because you are still reading from disk. In some cases cold cache makes your system slower because you're doing extra work (extra step of looking up in table) to warm up your cache.

After some time you'll have some values in your cache, and by some time you try to read file, you lookup in cache and BAM! you have found inode (cache hit)! Now you have starting disk block, so you skip reading superblock and start reading file itself! You have just saved 10ms!

That cache it called warm cache - cache with some values that gives you cache hits.



回答2:

Background:

Cache is a small and faster memory, that helps avoid CPU to access main memory (bigger and slower) to save time (cache reads are ~100 x faster than reads from main memory). But this only helps if the data that your program needs has been cached (read from main memory into cache) and is valid. Also, cache gets populated with data over time. So, cache can be:
1. Empty, or
2. can contain irrelevant data, or
3. can contain relevant data.


Now, to your question:

Cold cache: When the cache is empty or has irrelevant data, so that CPU needs to do a slower read from main memory for your program data requirement.

Hot cache: When the cache contains relevant data, and all the reads for your program are satisfied from the cache itself.

So, hot caches are desirable, cold caches are not.



回答3:

Very nice reponse @avd.

Cold Cache is just a blank cache or one with stale-data.

Hot Cache on the other hand, maintains useful data that your system requires. It helps you achieve faster processing; mostly it is used for near real-time processing of requests. There are systems/processes that need certain information handy before they start catering to user-requests; such as a trading platform which would require market-data/risk-info/security-info etc before it can processes a user-request. It will be time-consuming if for each request the process has to query a DB/service to get this critical info. So it would be a good idea to cache it; and that would be feasible through Hot Cache. This cache should be maintained regularly (updates/removals etc); otherwise over the period your cache may grow in size with unecessary data and you might notice perfomance degradation.

To create Hot Cache, one method would be a lazy-population of cache, what I mean by that is that as and when you get requests you populate the cahce; in that case the initial requests would be slow but subsequent ones would be quicker. Another approach would be to load the data at process start-up (or before user requests start coming in) and maintain the cache till the process lives.