Batch set data from Dictionary into Redis

2020-07-24 06:39发布

问题:

I am using StackExchange Redis DB to insert a dictionary of Key value pairs using Batch as below:

private static StackExchange.Redis.IDatabase _database;
public void SetAll<T>(Dictionary<string, T> data, int cacheTime)
{
    lock (_database)
    {
        TimeSpan expiration = new TimeSpan(0, cacheTime, 0);
        var list = new List<Task<bool>>();
        var batch = _database.CreateBatch();               
        foreach (var item in data)
        {
            string serializedObject = JsonConvert.SerializeObject(item.Value, Formatting.Indented,
        new JsonSerializerSettings { ContractResolver = new SerializeAllContractResolver(), ReferenceLoopHandling = ReferenceLoopHandling.Ignore });

            var task = batch.StringSetAsync(item.Key, serializedObject, expiration);
            list.Add(task);
            serializedObject = null;
        }
        batch.Execute();

        Task.WhenAll(list.ToArray());
    }
}

My problem: It takes around 7 seconds to set just 350 items of dictionary.

My question: Is this the right way to set bulk items into Redis or is there a quicker way to do this? Any help is appreciated. Thanks.

回答1:

"just" is a very relative term, and doesn't really make sense without more context, in particular: how big are these payloads?

however, to clarify a few points to help you investigate:

  • there is no need to lock an IDatabase unless that is purely for your own purposes; SE.Redis deals with thread safety internally and is intended to be used by competing threads
  • at the moment, your timing of this will include all the serialization code (JsonConvert.SerializeObject); this will add up, especially if your objects are big; to get a decent measure, I strongly suggest you time the serialization and redis times separately
  • the batch.Execute() method uses a pipeline API and does not wait for responses between calls, so: the time you're seeing is not the cumulative effect of latency; that leaves just local CPU (for serialization), network bandwidth, and server CPU; the client library tools can't impact any of those things
  • there is a StringSet overload that accepts a KeyValuePair<RedisKey, RedisValue>[]; you could choose to use this instead of a batch, but the only difference here is that it is the varadic MSET rather than muliple SET; either way, you'll be blocking the connection for other callers for the duration (since the purpose of batch is to make the commands contiguous)
  • you don't actually need to use CreateBatch here, especially since you're locking the database (but I still suggest you don't need to do this); the purpose of CreateBatch is to make a sequence of commands sequential, but I don't see that you need this here; you could just use _database.StringSetAsync for each command in turn, which would also have the advantage that you'd be running serialization in parallel to the previous command being sent - it would allow you to overlap serialization (CPU bound) and redis ops (IO bound) without any work except to delete the CreateBatch call; this will also mean that you don't monopolize the connection from other callers

So; the first thing I would do would be to remove some code:

private static StackExchange.Redis.IDatabase _database;
static JsonSerializerSettings _redisJsonSettings = new JsonSerializerSettings {
    ContractResolver = new SerializeAllContractResolver(),
    ReferenceLoopHandling = ReferenceLoopHandling.Ignore };

public void SetAll<T>(Dictionary<string, T> data, int cacheTime)
{
    TimeSpan expiration = new TimeSpan(0, cacheTime, 0);
    var list = new List<Task<bool>>();
    foreach (var item in data)
    {
        string serializedObject = JsonConvert.SerializeObject(
            item.Value, Formatting.Indented, _redisJsonSettings);

        list.Add(_database.StringSetAsync(item.Key, serializedObject, expiration));
    }
    Task.WhenAll(list.ToArray());
}

The second thing I would do would be to time the serialization separately to the redis work.

The thrid thing I would do would be to see if I can serialize to a MemoryStream instead, ideally one that I can re-use - to avoid the string alocation and UTF-8 encode:

using(var ms = new MemoryStream())
{
    foreach (var item in data)
    {
        ms.Position = 0;
        ms.SetLength(0); // erase existing data
        JsonConvert.SerializeObject(ms,
            item.Value, Formatting.Indented, _redisJsonSettings);

        list.Add(_database.StringSetAsync(item.Key, ms.ToArray(), expiration));
    }
}


回答2:

This second answer is kinda tangential, but based on the discussion it sounds as though the main cost is serialization:

The object in this context is big with huge infos in string props and many nested classes.

One thing you could do here is not store JSON. JSON is relatively large, and being text-based is relatively expensive to process both for serialization and deserialization. Unless you're using rejson, redis just treats your data as an opaque blob, so it doesn't care what the actual value is. As such, you can use more efficient formats.

I'm hugely biased, but we make use of protobuf-net in our redis storage. protobuf-net is optimized for:

  • small output (dense binary without redundant information)
  • fast binary processing (absurdly optimized with contextual IL emit, etc)
  • good cross-platform support (it implements Google's "protobuf" wire format, which is available on just about every platform available)
  • designed to work well with existing C# code, not just brand new types generated from a .proto schema

I suggest protobuf-net rather than Google's own C# protobuf library because of the last bullet point, meaning: you can use it with the data you already have.

To illustrate why, I'll use this image from https://aloiskraus.wordpress.com/2017/04/23/the-definitive-serialization-performance-guide/:

Notice in particular that the output size of protobuf-net is half that of Json.NET (reducing the bandwidth cost), and the serialization time is less than one fifth (reducing local CPU cost).

You would need to add some attributes to your model to help protobuf-net out (as per How to convert existing POCO classes in C# to google Protobuf standard POCO), but then this would be just:

using(var ms = new MemoryStream())
{
    foreach (var item in data)
    {
        ms.Position = 0;
        ms.SetLength(0); // erase existing data
        ProtoBuf.Serializer.Serialize(ms, item.Value);

        list.Add(_database.StringSetAsync(item.Key, ms.ToArray(), expiration));
    }
}

As you can see, the code change to your redis code is minimal. Obviously you would need to use Deserialize<T> when reading the data back.


If your data is text based, you might also consider running the serialization through GZipStream or DeflateStream; if your data is dominated by text, it will compress very well.