While stress testing prototype of our brand new primary system, I run into concurrent issue with AppFabric Cache. When concurrently calling many DataCache.Get() and Put() with same cacheKey, where I attempt to store relatively large objet, I recieve "ErrorCode:SubStatus:There is a temporary failure. Please retry later." It is reproducible by the following code:
var dcfc = new DataCacheFactoryConfiguration
{
Servers = new[] {new DataCacheServerEndpoint("localhost", 22233)},
SecurityProperties = new DataCacheSecurity(DataCacheSecurityMode.None, DataCacheProtectionLevel.None),
};
var dcf = new DataCacheFactory(dcfc);
var dc = dcf.GetDefaultCache();
const string key = "a";
var value = new int [256 * 1024]; // 1MB
for (int i = 0; i < 300; i++)
{
var putT = new Thread(() => dc.Put(key, value));
putT.Start();
var getT = new Thread(() => dc.Get(key));
getT.Start();
}
When calling Get() with different key or DataCache is synchronized, this issue will not appear. If DataCache is obtained with each call from DataCacheFactory (DataCache is supposed to be thread-safe) or timeouts are prolonged it has no effect and error is still received.
It seems to me very strange that MS would leave such bug. Did anybody faced similar issue?
I also see the same behavior and my understanding is that this is by design. The cache contains two concurrency models:
- Optimistic Concurrency Model methods:
Get
, Put
, ...
- Pessimistic Concurrency Model:
GetAndLock
, PutAndLock
, Unlock
If you use optimistic concurrency model methods like Get
then you have to be ready to get DataCacheErrorCode.RetryLater
and handle that appropriately - I also use a retry approach.
You might find more information at MSDN: Concurrency Models
We have seen this problem as well in our code. We solve this by overloading the Get method to catch expections and then retry the call N times before fallback to a direct request to SQL.
Here is a code that we use to get data from the cache
private static bool TryGetFromCache(string cacheKey, string region, out GetMappingValuesToCacheResult cacheResult, int counter = 0)
{
cacheResult = new GetMappingValuesToCacheResult();
try
{
// use as instead of cast, as this will return null instead of exception caused by casting.
if (_cache == null) return false;
cacheResult = _cache.Get(cacheKey, region) as GetMappingValuesToCacheResult;
return cacheResult != null;
}
catch (DataCacheException dataCacheException)
{
switch (dataCacheException.ErrorCode)
{
case DataCacheErrorCode.KeyDoesNotExist:
case DataCacheErrorCode.RegionDoesNotExist:
return false;
case DataCacheErrorCode.Timeout:
case DataCacheErrorCode.RetryLater:
if (counter > 9) return false; // we tried 10 times, so we will give up.
counter++;
Thread.Sleep(100);
return TryGetFromCache(cacheKey, region, out cacheResult, counter);
default:
EventLog.WriteEntry(EventViewerSource, "TryGetFromCache: DataCacheException caught:\n" +
dataCacheException.Message, EventLogEntryType.Error);
return false;
}
}
}
Then when we need to get something from the cache we do:
TryGetFromCache(key, region, out cachedMapping)
This allows us to use Try methods that encasulates the exceptions. If it returns false, we know thing is wrong with the cache and we can access SQL directly.