C# Improve parallel performance of Sparse Matrix R

2019-07-20 22:06发布

问题:

I have a sparse matrix containing roughly 100 million non-zero elements:

// [Row][Column][Element]
public IDictionary<int, IDictionary<int, decimal>> MyMatrix { get; private set; }

Getting the sum of each row is very fast:

private void RowSum()
{
    var rowTotals = new ConcurrentDictionary<int, decimal>();

    Parallel.ForEach(MyMatrix, (row) =>
    {
         rowTotals.TryAdd(row.Key, row.Value.Sum(x => x.Value));
    });
}

Getting the sum of each column is much slower:

private void ColumnSum()
{
   var columnTotals = new ConcurrentDictionary<int, decimal>();

   Parallel.ForEach(MyMatrix, (row) =>
   {
        foreach (var column in row.Value)
        {
            columnTotals.AddOrUpdate(column.Key, column.Value, 
                 (key, old) => old + column.Value);
        }
   });
}

To make column calculations faster, I could create a [Column][Row][Element] matrix, but that would double the RAM requirement. Is there any approach or data structure that would allow for the column calculations to be as fast as the row calculations, without doubling the ram?

回答1:

What could be happening is that there is contention for the centralized ConcurrentDictionary. If this is the case, you could try the localInit overload of Parallel.ForEach, to give each Task batch its own local (and uncontended) Dictionary, which is then aggregated into the central dictionary at the end:

var columnTotals = new ConcurrentDictionary<int, decimal>();

Parallel.ForEach(MyMatrix,
    // Each Task gets own dictionary
    () => new Dictionary<int, decimal>(),
    (row, state, colTots) =>
    {
        foreach (var column in row.Value)
        {
            if (!colTots.ContainsKey(column.Key))
            {
                colTots[column.Key] = column.Value;
            }
            else
            {
                colTots[column.Key] += column.Value;
            }
        }
        return colTots;
    },
    colTots =>
    {
        // Aggregate the dictionaries
        foreach (var column in colTots)
        {
            columnTotals.AddOrUpdate(column.Key, column.Value, 
                (key, old) => old + column.Value);
        }
    });

Edit

Some timings (10M populated elements in a 100000 x 100000 space)

  • Your RowSum 425ms
  • Your ColumnSum 7774ms
  • localInit ColumnSum 3324ms

So still an order of magnitude slower than the row sums, but looks like a reasonable improvement.

(Was also bug in my Dictionary usage)



回答2:

I think that

 Parallel.ForEach(MyMatrix, (row) =>
   {
        foreach (var column in row.Value)
        {
            columnTotals.AddOrUpdate(column.Key, 0, (key, old) => old + column.Value);
        }
   });

should be

 Parallel.ForEach(MyMatrix, (row) =>
   {
        foreach (var column in row.Value)
        {
            columnTotals.AddOrUpdate(column.Key, column.value, (key, old) => old + column.Value);
        }
   });

I think that you can make the performance more symmetrical (but not faster) by starting with a public IDictionary<Tuple<int, int>, decimal> MyMatrix { get; private set; }



回答3:

If the difference between the highest and lowest column values is sufficiently low to create a simple int array, you may proceed as follow:

Identify the highest and lowest value to further create a correspondence array associating to each column value an index in the 2 arrays used for summing the value, i.e an array storing the column values and, in parallel, the Columns Totals.

Your code will look like:

private void ColumnSum()
{
   int highestKeyValue=int.MinValue;
   int lowestKeyValue =int.MaxValue;
   Parallel.ForEach(MyMatrix, (row) =>
   { // identify highest and lowest column value
     foreach (var column in row.Value) 
     {
       lowest =Math.Min(lowestKeyValue ,column.Key) ; 
       highest=Math.Max(highestKeyValue,column.Key) ; 
      }
   // Create correspondence array 
   int[] corrrespondence=new int[highestKeyValue-lowestKeyValue]; 
   for (int i=0;i<highest-lowest;i++) corrrespondence[i]=-1 ; 
   Parallel.ForEach(MyMatrix, (row) =>
   { // tag the key values found in matrix
     foreach (var column in row.Value) corrrespondence[column.Key-lowest]=0 ; 
   }
   int columnsCount=0 ;
   // compute the indexes to result arrays
   for (int i=0;i<highest-lowest;i++) 
     if (corrrespondence[i]>=0) corrrespondence[i]=columnsCount++  ; 
   // allocate and initialize results array
   int[] columnValues=new int[columnsCount]() ;
   int[] columnTotals=new int[columnsCount]() ;
   for (int i=0;i<columnsCount;i++) columnTotals[i]=0 ; 
   Parallel.ForEach(MyMatrix, (row) =>
   {
     foreach (var column in row.Value)
     {
       int j=correspondence[column.Key-lowest] ;
       // a lock on results[j] is required there to avoid concurrent update of results[j] 
       columnValues[j]=column.Key ;
       columnTotals[j]+=column.Value ;
     }
   }
 }