The current task, iterating over massive dictionaries, is giving me a headache. I cannot pinpoint the exact source of high CPU usage here so I hope some of the C# gurus here can give me some hints and tips.
The setup is 10 preallocated Guid-byte[] dictionaries, each holding one million entries. The process is iterating over all of them, each dictionary has it's own thread. Simply iterating over all of them and passing byte[] reference to iteration delegate, yielding random result takes under 2ms, but actually accessing any byte in the containing entries causes this number to rise to 300+ms.
Note: The iteration delegate is constructed before any iterations and then I'm only passing reference.
If i'm not doing anything with the received byte reference, it's all incredibly fast:
var iterationDelegate = new Action<byte[]>((bytes) =>
{
var x = 5 + 10;
});
But once I attempt to access the very first byte (that actually contains a pointer to the row's metadata somewhere else)
var iterationDelegate = new Action<byte[]>((bytes) =>
{
var b = (int)bytes[0];
});
The total time shoots up and what's even weirder, the first set of iterations takes 30ms, the second 40+, the third 100+ and the fourth can take 500ms+... then I stop testing the performance, Sleep the calling thread for a few seconds and once I start iterating again, it starts casually at 30ms and then rises same as before until I give it "time to breathe" again.
When I watch it in the VS CPU call tree, 93% of the CPU is consumed by [External Code] that I cannot view or at least see what it is.
Is there anything I can do to help this? Is it the GC having a rough time?
Edit 1: The actual code I want to run is:
var iterationDelegate = new Action<byte[]>((data) =>
{
//compare two bytes, ensure the row belongs to desired table
if (data[0] != table.TableIndex)
return;
//get header length
var headerLength = (int)data[1];
//process the header info and retrieve the desired column data position:
var columnInfoPos = (key * 6) + 2;
var pointers = new int[3] {
//data position
BitConverter.ToInt32(new byte[4] {
data[columnInfoPos],
data[columnInfoPos + 1],
data[columnInfoPos + 2],
data[columnInfoPos + 3] }),
//data length
BitConverter.ToUInt16(new byte[2] {
data[columnInfoPos + 4],
data[columnInfoPos + 5] }),
//column info position
columnInfoPos };
});
But this code is even slower, the iteration times are ~150, ~300, ~600, 700+
This is the worker class that's kept alive for each store in respective threads:
class PartitionWorker
{
private ManualResetEvent waitHandle = new ManualResetEvent(true);
private object key = new object();
private bool stop = false;
private List<Action> queue = new List<Action>();
public void AddTask(Action task)
{
lock (key)
queue.Add(task);
waitHandle.Set();
}
public void Run()
{
while (!stop)
{
lock (key)
if (queue.Count > 0)
{
var task = queue[0];
task();
queue.Remove(task);
continue;
}
waitHandle.Reset();
waitHandle.WaitOne();
}
}
public void Stop()
{
stop = true;
}
}
And lastly a code that launches the iterations, this code is run from a Task for each incoming TCP request.
for (var memoryPartition = 0; memoryPartition < partitions; memoryPartition++)
{
var memIndex = memoryPartition;
mem[memIndex].AddJob(() =>
{
try
{
//... to keep it shor i have excluded readlock and try/finally
foreach (var obj in mem[memIndex].innerCache.Values)
{
iterationDelegate(obj.bytes);
}
//release readlock in finally..
}
catch
{
}
finally
{
latch.Signal();
}
});
}
try
{
latch.Wait(50);
sw.Stop();
Console.WriteLine("Found " + result.Count + " in " + sw.Elapsed.TotalMilliseconds + "ms");
}
catch
{
Console.WriteLine(">50");
}
Edit2: The dictionaries are preallocated using
private Dictionary<Guid, byte[]> innerCache = new Dictionary<Guid, byte[]>(part_max_entries);
and regarding the entries, they are 70 bytes on average. The process is taking around 2Gb of memory with 10 000 000 entries split among 10 dictionaries.
The structure of the entry is following:
T | HL | {POS | POS | POS | POS | LEN | LEN} | {data bytes}
where | indicates separate bytes
- T is a byte pointer to table metadata dictionary
- HL is a byte length of the header portion if the entry
POS and LEN repeat for each data value in the entry:
- POSx4 = int indicating the position of this data in the entry
- POSx2 = ushort length of this data in the entry
and then {data bytes} are the data payload