Garbage Collection and Parallel.ForEach Issue Afte

2019-01-11 09:32发布

问题:

I have some code to process several million data rows in my own R-like C# DataFrame class. There's a number of Parallel.ForEach calls for iterating over the data rows in parallel. This code has been running for over a year using VS2013 and .NET 4.5 without issues.

I have two dev machines (A and B) and recently upgraded machine A to VS2015. I started noticing a strange intermittent freeze in my code about half the time. Letting it run for a long time, it turns out that the code does eventually finish. It just takes 15-120 minutes instead of 1-2 minutes.

Attempts to Break All using the VS2015 debugger keep failing for some reason. So I inserted a bunch of log statements. It turns out that this freeze occurs when there is a Gen2 collection during a Parallel.ForEach loop (comparing the collection count before and after each Parallel.ForEach loop). The entire extra 13-118 minutes is spent inside whichever Parallel.ForEach loop call happens to overlap with a Gen2 collection (if any). If there are no Gen2 collections during any Parallel.ForEach loops (about 50% of the time when I run it), then everything finishes fine in 1-2 minutes.

When I run the same code in VS2013 on Machine A, I get the same freezes. When I run the code in VS2013 on Machine B (which was never upgraded), it works perfectly. It ran dozens of time overnight with no freezing.

Some things I've noticed / tried:

  • The freezes happen with or without the debugger attached on Machine A (I figured it was something with the VS2015 debugger at first)
  • The freezes happen whether I build in Debug or Release mode
  • The freezes happen if I target .NET 4.5 or .NET 4.6
  • I tried disabling RyuJIT but that did not affect the freezes

I'm not changing the default GC settings at all. According to GCSettings, all runs are happening with LatencyMode Interactive and IsServerGC as false.

I could just switch to LowLatency before every call to Parallel.ForEach, but I'd really prefer to understand what's going on.

Has anyone else seen strange freezes in Parallel.ForEach after the VS2015 upgrade? Any ideas on what a good next step would be?

UPDATE 1: Adding some sample code to the nebulous explanation above...

Here is some sample code that I hope will demonstrate this issue. This code runs in 10-12 seconds on B machine, consistently. It encounters a number of Gen2 collections, but they take almost no time at all. If I uncomment the two GC settings lines, I can force it to have no Gen2 collections. It's somewhat slower then at 30-50 seconds.

Now on my A machine, the code takes a random amount of time. Seems to be between 5 and 30 minutes. And it seems to get worse, the more Gen2 collections it encounters. If I uncomment the two GC settings lines, it takes 30-50 seconds on Machine A also (same as Machine B).

It might take some tweaking in terms of the number of rows and array size for this to show up on another machine.

using System;
using System.Collections;
using System.Collections.Generic;
using System.IO;
using System.Diagnostics;
using System.Threading;
using System.Threading.Tasks;
using System.Linq;
using System.Runtime;    

public class MyDataRow
{
    public int Id { get; set; }
    public double Value { get; set; }
    public double DerivedValuesSum { get; set; }
    public double[] DerivedValues { get; set; }
}

class Program
{
    static void Example()
    {
        const int numRows = 2000000;
        const int tempArraySize = 250;

        var r = new Random();
        var dataFrame = new List<MyDataRow>(numRows);

        for (int i = 0; i < numRows; i++) dataFrame.Add(new MyDataRow { Id = i, Value = r.NextDouble() });

        Stopwatch stw = Stopwatch.StartNew();

        int gcs0Initial = GC.CollectionCount(0);
        int gcs1Initial = GC.CollectionCount(1);
        int gcs2Initial = GC.CollectionCount(2);

        //GCSettings.LatencyMode = GCLatencyMode.LowLatency;

        Parallel.ForEach(dataFrame, dr =>
        {
            double[] tempArray = new double[tempArraySize];
            for (int j = 0; j < tempArraySize; j++) tempArray[j] = Math.Pow(dr.Value, j);
            dr.DerivedValuesSum = tempArray.Sum();
            dr.DerivedValues = tempArray.ToArray();
        });

        int gcs0Final = GC.CollectionCount(0);
        int gcs1Final = GC.CollectionCount(1);
        int gcs2Final = GC.CollectionCount(2);

        stw.Stop();

        //GCSettings.LatencyMode = GCLatencyMode.Interactive;

        Console.Out.WriteLine("ElapsedTime = {0} Seconds ({1} Minutes)", stw.Elapsed.TotalSeconds, stw.Elapsed.TotalMinutes);

        Console.Out.WriteLine("Gcs0 = {0} = {1} - {2}", gcs0Final - gcs0Initial, gcs0Final, gcs0Initial);
        Console.Out.WriteLine("Gcs1 = {0} = {1} - {2}", gcs1Final - gcs1Initial, gcs1Final, gcs1Initial);
        Console.Out.WriteLine("Gcs2 = {0} = {1} - {2}", gcs2Final - gcs2Initial, gcs2Final, gcs2Initial);

        Console.Out.WriteLine("Press Any Key To Exit...");
        Console.In.ReadLine();
    }

    static void Main(string[] args)
    {
        Example();
    }
}

UPDATE 2: Just to move things out of the comments for future readers...

This hotfix: https://support.microsoft.com/en-us/kb/3088957 totally fixes the issue. I'm not seeing any slowness issues at all after applying.

It turned out not to have anything to do with Parallel.ForEach I believe based on this: http://blogs.msdn.com/b/maoni/archive/2015/08/12/gen2-free-list-changes-in-clr-4-6-gc.aspx though the hotfix does mention Parallel.ForEach for some reason.

回答1:

It looks like the problem has been addressed now, see http://blogs.msdn.com/b/maoni/archive/2015/08/12/gen2-free-list-changes-in-clr-4-6-gc.aspx



回答2:

This indeed performs excessively poorly, the background GC is not doing you favor here. First thing I noted is that Parallel.ForEach() is using too many tasks. The threadpool manager misinterprets the thread behavior as "bogged down by I/O" and starts extra threads. This makes the problem worse. Workaround for that is:

var options = new ParallelOptions();
options.MaxDegreeOfParallelism = Environment.ProcessorCount;

Parallel.ForEach(dataFrame, options, dr => {
    // etc..
}

This gives better insight in what ails the program from the new diagnostics hub in VS2015. It doesn't take long for only a single core doing any work, easy to tell from the CPU usage. With occasional spikes, they don't last very long, coinciding with an orange GC mark. When you take a closer look at the GC mark you see it is a gen #1 collection. Taking a very long time, about 6 seconds on my machine.

A gen #1 collection of course doesn't take that long, what you see happening here is the gen #1 collection waiting for the background GC to finish its job. In other words, it is actually the background GC that's taking 6 seconds. Background GC can only be effective if the space in the gen #0 and gen #1 segments is large enough to not require a gen #2 collection while the background GC is trundling. Not the way this app works, it eats memory at a very high rate. The little spike you see is multiple tasks getting unblocked, being able to allocate arrays again. Quickly grinding to a halt when a gen #1 collection has to wait for the background GC again.

Notable is that the allocation pattern of this code is very unfriendly to the GC. It interleaves long-lived arrays (dr.DerivedValues) with short-lived arrays (tempArray). Giving the GC lots of work when it compacts the heap, every single allocated array is going to end up getting moved.

The apparent flaw in the .NET 4.6 GC is that the background collection never seems to effectively compact the heap. It looks like it does the job over and over again, as though the previous collection didn't compact at all. Whether this is by design or a bug is hard to tell, I don't have a clean 4.5 machine anymore. I'm certainly leaning towards bug. You should report this problem at connect.microsoft.com to have Microsoft take a look at it.


A workaround is very easy to come by, all you have to do is prevent the awkward inter-leaving of long- and short-lived objects. Which you do by pre-allocating them:

    for (int i = 0; i < numRows; i++) dataFrame.Add(new MyDataRow { 
        Id = i, Value = r.NextDouble(), 
        DerivedValues = new double[tempArraySize] });

    ...
    Parallel.ForEach(dataFrame, options, dr => {
        var array = dr.DerivedValues;
        for (int j = 0; j < array.Length; j++) array[j] = Math.Pow(dr.Value, j);
        dr.DerivedValuesSum = array.Sum();
    });

And of course by disabling background GC completely.


UPDATE: GC bug confirmed in this blog post. Fix coming soon.


UPDATE: a hotfix was released.


UPDATE: fixed in .NET 4.6.1



回答3:

We (and other users) have encountered a similar problem. We worked around it by disabling background GC in the application's app.config. Please see discussion in comments of https://connect.microsoft.com/VisualStudio/Feedback/Details/1594775.

app.config for gcConcurrent (non-concurrent workstation GC)

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
    <startup> 
        <supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.5.1" />
    </startup>
<runtime>
    <gcConcurrent enabled="false" />
</runtime>

You can also switch to the server GC, although this approach seems to use more memory (on an unsaturated machine?).

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
    <startup> 
        <supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.5.1" />
    </startup>
<runtime>
    <gcServer enabled="true" />
</runtime>
</configuration>