I have a sparse matrix containing roughly 100 million non-zero elements:
// [Row][Column][Element]
public IDictionary<int, IDictionary<int, decimal>> MyMatrix { get; private set; }
Getting the sum of each row is very fast:
private void RowSum()
{
var rowTotals = new ConcurrentDictionary<int, decimal>();
Parallel.ForEach(MyMatrix, (row) =>
{
rowTotals.TryAdd(row.Key, row.Value.Sum(x => x.Value));
});
}
Getting the sum of each column is much slower:
private void ColumnSum()
{
var columnTotals = new ConcurrentDictionary<int, decimal>();
Parallel.ForEach(MyMatrix, (row) =>
{
foreach (var column in row.Value)
{
columnTotals.AddOrUpdate(column.Key, column.Value,
(key, old) => old + column.Value);
}
});
}
To make column calculations faster, I could create a [Column][Row][Element] matrix, but that would double the RAM requirement. Is there any approach or data structure that would allow for the column calculations to be as fast as the row calculations, without doubling the ram?
What could be happening is that there is contention for the centralized ConcurrentDictionary
. If this is the case, you could try the localInit
overload of Parallel.ForEach
, to give each Task batch its own local (and uncontended) Dictionary
, which is then aggregated into the central dictionary at the end:
var columnTotals = new ConcurrentDictionary<int, decimal>();
Parallel.ForEach(MyMatrix,
// Each Task gets own dictionary
() => new Dictionary<int, decimal>(),
(row, state, colTots) =>
{
foreach (var column in row.Value)
{
if (!colTots.ContainsKey(column.Key))
{
colTots[column.Key] = column.Value;
}
else
{
colTots[column.Key] += column.Value;
}
}
return colTots;
},
colTots =>
{
// Aggregate the dictionaries
foreach (var column in colTots)
{
columnTotals.AddOrUpdate(column.Key, column.Value,
(key, old) => old + column.Value);
}
});
Edit
Some timings (10M populated elements in a 100000 x 100000 space)
- Your RowSum 425ms
- Your ColumnSum 7774ms
- localInit ColumnSum 3324ms
So still an order of magnitude slower than the row sums, but looks like a reasonable improvement.
(Was also bug in my Dictionary usage)
I think that
Parallel.ForEach(MyMatrix, (row) =>
{
foreach (var column in row.Value)
{
columnTotals.AddOrUpdate(column.Key, 0, (key, old) => old + column.Value);
}
});
should be
Parallel.ForEach(MyMatrix, (row) =>
{
foreach (var column in row.Value)
{
columnTotals.AddOrUpdate(column.Key, column.value, (key, old) => old + column.Value);
}
});
I think that you can make the performance more symmetrical (but not faster) by starting with a public IDictionary<Tuple<int, int>, decimal> MyMatrix { get; private set; }
If the difference between the highest and lowest column values is sufficiently low to create a simple int array, you may proceed as follow:
Identify the highest and lowest value to further create a correspondence array associating to each column value an index in the 2 arrays used for summing the value, i.e an array storing the column values and, in parallel, the Columns Totals.
Your code will look like:
private void ColumnSum()
{
int highestKeyValue=int.MinValue;
int lowestKeyValue =int.MaxValue;
Parallel.ForEach(MyMatrix, (row) =>
{ // identify highest and lowest column value
foreach (var column in row.Value)
{
lowest =Math.Min(lowestKeyValue ,column.Key) ;
highest=Math.Max(highestKeyValue,column.Key) ;
}
// Create correspondence array
int[] corrrespondence=new int[highestKeyValue-lowestKeyValue];
for (int i=0;i<highest-lowest;i++) corrrespondence[i]=-1 ;
Parallel.ForEach(MyMatrix, (row) =>
{ // tag the key values found in matrix
foreach (var column in row.Value) corrrespondence[column.Key-lowest]=0 ;
}
int columnsCount=0 ;
// compute the indexes to result arrays
for (int i=0;i<highest-lowest;i++)
if (corrrespondence[i]>=0) corrrespondence[i]=columnsCount++ ;
// allocate and initialize results array
int[] columnValues=new int[columnsCount]() ;
int[] columnTotals=new int[columnsCount]() ;
for (int i=0;i<columnsCount;i++) columnTotals[i]=0 ;
Parallel.ForEach(MyMatrix, (row) =>
{
foreach (var column in row.Value)
{
int j=correspondence[column.Key-lowest] ;
// a lock on results[j] is required there to avoid concurrent update of results[j]
columnValues[j]=column.Key ;
columnTotals[j]+=column.Value ;
}
}
}