I have a sparse matrix containing roughly 100 million non-zero elements:
// [Row][Column][Element]
public IDictionary<int, IDictionary<int, decimal>> MyMatrix { get; private set; }
Getting the sum of each row is very fast:
private void RowSum()
{
var rowTotals = new ConcurrentDictionary<int, decimal>();
Parallel.ForEach(MyMatrix, (row) =>
{
rowTotals.TryAdd(row.Key, row.Value.Sum(x => x.Value));
});
}
Getting the sum of each column is much slower:
private void ColumnSum()
{
var columnTotals = new ConcurrentDictionary<int, decimal>();
Parallel.ForEach(MyMatrix, (row) =>
{
foreach (var column in row.Value)
{
columnTotals.AddOrUpdate(column.Key, column.Value,
(key, old) => old + column.Value);
}
});
}
To make column calculations faster, I could create a [Column][Row][Element] matrix, but that would double the RAM requirement. Is there any approach or data structure that would allow for the column calculations to be as fast as the row calculations, without doubling the ram?
If the difference between the highest and lowest column values is sufficiently low to create a simple int array, you may proceed as follow:
Identify the highest and lowest value to further create a correspondence array associating to each column value an index in the 2 arrays used for summing the value, i.e an array storing the column values and, in parallel, the Columns Totals.
Your code will look like:
I think that
should be
I think that you can make the performance more symmetrical (but not faster) by starting with a
public IDictionary<Tuple<int, int>, decimal> MyMatrix { get; private set; }
What could be happening is that there is contention for the centralized
ConcurrentDictionary
. If this is the case, you could try thelocalInit
overload ofParallel.ForEach
, to give each Task batch its own local (and uncontended)Dictionary
, which is then aggregated into the central dictionary at the end:Edit
Some timings (10M populated elements in a 100000 x 100000 space)
So still an order of magnitude slower than the row sums, but looks like a reasonable improvement.
(Was also bug in my Dictionary usage)