Task caching when performing Tasks in parallel wit

2019-05-06 16:11发布

问题:

So I have this small code block that will perform several Tasks in parallel.

// no wrapping in Task, it is async
var activityList = await dataService.GetActivitiesAsync();

// Select a good enough tuple
var results = (from activity in activityList
               select new { 
                Activity = activity, 
                AthleteTask = dataService.GetAthleteAsync(activity.AthleteID)
               }).ToList(); // begin enumeration

// Wait for them to finish, ie relinquish control of the thread
await Task.WhenAll(results.Select(t => t.AthleteTask));

// Set the athletes
foreach(var pair in results)
{
  pair.Activity.Athlete = pair.AthleteTask.Result;
}

So I'm downloading Athlete data for each given Activity. But it could be that we are requesting the same athlete several times. How can we ensure that the GetAthleteAsync method will only go online to fetch the actual data if it's not yet in our memory cache?

Currently I tried using a ConcurrentDictionary<int, Athelete> inside the GetAthleteAsync method

private async Task<Athlete> GetAthleteAsync(int athleteID)
{
       if(cacheAthletes.Contains(athleteID))
             return cacheAthletes[atheleID];

       ** else fetch from web
}

回答1:

You can change your ConcurrentDictionary to cache the Task<Athlete> instead of just the Athlete. Remember, a Task<T> is a promise - an operation that will eventually result in a T. So, you can cache operations instead of results.

ConcurrentDictionary<int, Task<Athlete>> cacheAthletes;

Then, your logic will go like this: if the operation is already in the cache, return the cached task immediately (synchronously). If it's not, then start the download, add the download operation to the cache, and return the new download operation. Note that all the "download operation" logic is moved to another method:

private Task<Athlete> GetAthleteAsync(int athleteID)
{
  return cacheAthletes.GetOrAdd(athleteID, id => LoadAthleteAsync(id));
}

private async Task<Athlete> LoadAthleteAsync(int athleteID)
{
  // Load from web
}

This way, multiple parallel requests for the same athlete will get the same Task<Athlete>, and each athlete is only downloaded once.



回答2:

You also need to skip tasks, which unsuccessfuly completed. That's my snippet:

ObjectCache _cache = MemoryCache.Default;
static object _lockObject = new object();
public Task<T> GetAsync<T>(string cacheKey, Func<Task<T>> func, TimeSpan? cacheExpiration = null) where T : class
{
    var task = (T)_cache[cacheKey];
    if (task != null) return task;          
    lock (_lockObject)
    {
        task = (T)_cache[cacheKey];
        if (task != null) return task;
        task = func();
        Set(cacheKey, task, cacheExpiration);
        task.ContinueWith(t => {
            if (t.Status != TaskStatus.RanToCompletion)
                _cache.Remove(cacheKey);
        });
    }
    return task;
}i


回答3:

When caching values provided by Task-objects, you'd like to make sure the cache implementation ensures that:

  • No parallel or unnecessary operations to get a value will be started. In your case, this is your question about avoiding multiple GetAthleteAsync for the same id.
  • You don't want to have negative caching (i.e. caching failed results), or if you do want it, it needs to be a implementation decision and you need to handle eventually replacing failed results somehow.
  • Cache users can't get invalidated results from the cache, even if the value is invalidated during an await.

I have a blog post about caching Task-objects with example code, that ensures all points above and could be useful in your situation. Basically my solution is to store Lazy<Task<T>> objects in a MemoryCache.