Should I store an array or individual items in Mem

2020-08-05 11:15发布

问题:

Right now we are storing some query results on Memcache. After investigating a bit more, I've seen that many people save each individual item in Memcache. The benefit of doing this is that they can get these items from Memcache on any other request.

Store an array

$key = 'page.items.20';
if( !( $results = $memcache->get($key) ) )
{
    $results = $con->execute('SELECT * FROM table LEFT JOIN .... LIMIT 0,20')->fetchAll();
    $memcache->save($results, $key, 3600);
}
...

PROS:

  • Easier

CONS:

  • If I change an individual item, I have to delete all caches (it can be a pain)
  • I can have duplicated results (the same item on different queries)

vs

Store each item

$key = 'page.items.20';
if( !( $results_ids = $memcache->get($key) ) )
{
    $results = $con->execute('SELECT * FROM table LEFT JOIN .... LIMIT 0,20')->fetchAll();

    $results_ids = array();
    foreach ( $results as $result )
    {
        $results_ids[] = $result['id'];
        // if doesn't exist, save individual item
        $memcache->add($result, 'item'.$result['id'], 3600);
    }

    // save results_ids 
    $memcache->save($results_ids, $key, 3600);
}
else
{
    $results = $memcache->multi_get($results_ids);
    // get elements which are not cached
    ...
}
... 

PROS:

  • I don't have the same item stored twice on Memcache
  • Easier to invalidate results on several queries (just the item we change)

CONS:

  • More complicated business logic.

What do you think? Any other PROS or CONS on each way?

Some links

  • Post explaining the second method in Memcached list
  • Thread in Memcached Group

回答1:

Grab stats and try to calculate a hit ratio or possible improvement if you cache the full query vs doing individual item grabs in MC. Profiling this kind of code is also very helpful to actually see how your theory applies.

It depends on what the query does. If you have a set of users and then want to grab the "top 10 music affinity" with some of those friends, it is worth to have both cachings: - each friend (in fact, each user of the site) - the top 10 query for each user (space is cheaper than CPU time)

But in general it is worth to store in MC all individual entities that are going to be used frequently (either in the same code execution, or in subsequent requests or by other users). Then things like CPU or resource heavy queries and data processings either MC-them or delegate them to async. jobs instead of making them realtime (e.g. Top 10 site users doesn't needs to be realtime, can be updated hourly or daily). And of course taking into account that if you store and MC individual entities, you have to remove all referential integrity from the DB to be able to reuse them either individually or in groups.



回答2:

The question is subjective and argumentative...

This depends on your usage pattern. If you're constantly pulling individual nodes by ID, store each one separately.

Also, note that in either case, storing the list isn't all that useful except for the top 20. If you insert/update/delete a node in such a way that the top-20 is no longer valid, you may end up needing to flush the next 20, and so on.

Lastly, keep in mind that it's a cache. If you're using a cache, you're making the underlying statement that it's no big deal if the data you're outputting is slightly stale.



回答3:

The memcached stores data in chunks of specific sizes as explained better in the link below.

http://code.google.com/p/memcached/wiki/NewUserInternals

If your data distributions in the memcached is large, then the number of the larger size chunks will be less and therefore the least recently used algorithm will push the data out even if their is space available in the other chunk sizes. The least recently used algorithm works on respective chunks. You can decide which implementation to choose based on the data size distribution in memcached.



标签: memcached