will .net Parallel Tasks exhaust all the threads i

2019-06-03 22:04发布

问题:

will .net Parallel Task exhaust all the threads in the pool the cause dead lock, the app hanged, incoming request can't be processed?

My asp.net app hanged. So I scratched a dump. I use DebugDiag to analyze. Dump analyzing is below:

87.40% of threads blocked (229 threads)

Total Threads: 232 
Running Threads: 232 
Idle Threads: 0 
Max Threads: 400 
Min Threads: 4 

DebugDiag report is showing:

The following threads in w3wp.DMP are waiting in a WaitOne

( 20 21 22 28 30 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 215 216 217 218 219 220 221 222 223 224 225 226 227 229 230 231 232 233 234 235 236 237 238 239 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 257 258 259 260 )

87.40% of threads blocked (229 threads)

I found that a thread is got the "write lock", parallel action is started whiling locking. But in the parallel task, system method call the "Monitor.ObjWait" the write lock thread is blocked.

Thread 177:
[[HelperMethodFrame_1OBJ] (System.Threading.Monitor.ObjWait)] System.Threading.Monitor.ObjWait(Boolean, Int32, System.Object) 
mscorlib_ni!System.Threading.ManualResetEventSlim.Wait(Int32, System.Threading.CancellationToken)+495 
mscorlib_ni!System.Threading.Tasks.Task.InternalRunSynchronously(System.Threading.Tasks.TaskScheduler)+14a 
System.Linq.Parallel.SpoolingTask.SpoolForAll[[System.__Canon, mscorlib],[System.Int32, mscorlib]](System.Linq.Parallel.QueryTaskGroupState, System.Linq.Parallel.PartitionedStream`2, System.Threading.Tasks.TaskScheduler)+ec 
System.Linq.Parallel.MergeExecutor`1[[System.__Canon, mscorlib]].Execute[[System.Int32, mscorlib]](System.Linq.Parallel.PartitionedStream`2, Boolean, System.Linq.ParallelMergeOptions, System.Threading.Tasks.TaskScheduler, Boolean, System.Linq.Parallel.CancellationState, Int32)+27b 
System.Linq.Parallel.PartitionedStreamMerger`1[[System.__Canon, mscorlib]].Receive[[System.Int32, mscorlib]](System.Linq.Parallel.PartitionedStream`2)+86 
System.Linq.Parallel.ForAllOperator`1[[System.__Canon, mscorlib]].WrapPartitionedStream[[System.Int32, mscorlib]](System.Linq.Parallel.PartitionedStream`2, System.Linq.Parallel.IPartitionedStreamRecipient`1, Boolean, System.Linq.Parallel.QuerySettings)+21f 
[[StubHelperFrame]] 
System.Linq.Parallel.UnaryQueryOperator`2+UnaryQueryOperatorResults+ChildResultsRecipient[[System.__Canon, mscorlib],[System.__Canon, mscorlib]].Receive[[System.Int32, mscorlib]](System.Linq.Parallel.PartitionedStream`2)+130 
System_Core_ni!System.Linq.Parallel.UnaryQueryOperator`2+UnaryQueryOperatorResults[[System.__Canon, mscorlib],[System.__Canon, mscorlib]].GivePartitionedStream(System.Linq.Parallel.IPartitionedStreamRecipient`1)+34f 
System_Core_ni!System.Linq.Parallel.QueryOperator`1[[System.__Canon, mscorlib]].GetOpenedEnumerator(System.Nullable`1, Boolean, Boolean, System.Linq.Parallel.QuerySettings)+2d4 
System_Core_ni!System.Linq.Parallel.ForAllOperator`1[[System.__Canon, mscorlib]].RunSynchronously()+319 
Info.UpdateCache(System.Collections.Generic.List`1, System.Collections.Generic.List`1, MySetting)+e2 
Info.GetInfo(System.Collections.Generic.List`1, MySetting)+4f

Many other threads try to get a read lock, but the write lock is not released, these threads are blocked.

[[HelperMethodFrame_1OBJ] (System.Threading.WaitHandle.WaitOneNative)] System.Threading.WaitHandle.WaitOneNative(System.Runtime.InteropServices.SafeHandle, UInt32, Boolean, Boolean) 
mscorlib_ni!System.Threading.WaitHandle.InternalWaitOne(System.Runtime.InteropServices.SafeHandle, Int64, Boolean, Boolean)+14 
System_Core_ni!System.Threading.ReaderWriterLockSlim.WaitOnEvent(System.Threading.EventWaitHandle, UInt32 ByRef, Int32)+a8 
System_Core_ni!System.Threading.ReaderWriterLockSlim.TryEnterWriteLockCore(Int32)+612861 
System_Core_ni!System.Threading.ReaderWriterLockSlim.TryEnterWriteLock(Int32)+28 
Info.UpdateCache(System.Collections.Generic.List`1, System.Collections.Generic.List`1, )+5f 
Info.GetInfo(System.Collections.Generic.List`1, MySetting)+4f

I go to check the code. GetInfo is triggered by request, the first request will get data from a soa service and update the local cache, then the other requests just get data from the local cache.

MyStaticInfo StaticInfo = Instance.GetInfo(new List<int>
        {
            1,2,3,4,5.......
        }, new MySetting
        {
            getInfo=true,
            extrainfo = true
        });

public MyStaticInfo GetInfo(List<int> IDList, MySetting setting)    
{
        .....
    MyStaticInfo requestSoaEntity = this.CreateSoaRequest(IDList, setting);
    MyStaticInfo soaData = this.GetSoaData(requestSoaEntity); //no lock in the method.
    if (soaData != null)
    {
        this.UpdateCache(soaData, IDList, setting);
    }
    ......
}


private MyStaticInfo CreateSoaRequest(List<int> IDList, MySetting setting)
{
    this.cacheLock.EnterReadLock();
    MyStaticInfo result;
    try
    {
        IDList.AsParallel<int>().ForAll(delegate(int ID)
        {
            ......
            result=....
         });
    }
    finally
    {
        this.cacheLock.ExitReadLock();
    }
    return result;
}






private void UpdateCache(MyStaticInfo responseSoa, List<int> IDList, MySetting setting)
{
    this.cacheLock.EnterWriteLock();
    try
    {
        IDList.AsParallel<int>().ForAll(delegate(int ID)
        {
          ......
        });
        if (responseSoa != null)
        {
            responseSoa.AsParallel().ForAll( soa=>
            {
                ........
            });
        }
    }
    finally
    {
        this.cacheLock.ExitWriteLock();
    }
}

I use windbg to analyze the dead lock, but it seems there is no dead lock.

0:253> !syncblk
Index         SyncBlock MonitorHeld Recursion Owning Thread Info          SyncBlock Owner
-----------------------------
Total           278
CCW             12
RCW             2
ComClassFactory 0
Free            209

0:253> !threads
ThreadCount:      244
UnstartedThread:  0
BackgroundThread: 244
PendingThread:    0
DeadThread:       0
Hosted Runtime:   no


  30   11  1970 0000000004cf00b0   3809220 Enabled  0000000000000000:0000000000000000 00000000025c3c90     1 MTA (Threadpool Worker)
  32   12  2a18 0000000004cf07c0   3809220 Enabled  0000000000000000:0000000000000000 00000000025c3c90     1 MTA (Threadpool Worker)
  33   13  255c 0000000004cf0ed0   3809220 Enabled  0000000000000000:0000000000000000 00000000025c3c90     1 MTA (Threadpool Worker)
  34   14  12fc 0000000004cf15e0   3009220 Enabled  0000000000000000:0000000000000000 00000000025c3c90     1 MTA (Threadpool Worker)
  35   15  283c 0000000004cf1cf0   3009220 Enabled  0000000000000000:0000000000000000 00000000025c3c90     1 MTA (Threadpool Worker)
  36   16  2e94 0000000004cf2400   3809220 Enabled  0000000000000000:0000000000000000 00000000025c3c90     1 MTA (Threadpool Worker)
  37   17  1c6c 0000000004cf2b10   3809220 Enabled  0000000000000000:0000000000000000 00000000025c3c90     1 MTA (Threadpool Worker)
  38   18  2d5c 0000000004cf3220   3009220 Enabled  0000000000000000:0000000000000000 00000000025c3c90     1 MTA (Threadpool Worker)


A lot of threads in the state of 3009220, Is the state ok?

0:253> !ThreadState 3009220 
    Legal to Join
    Background
    CLR Owns
    In Multi Threaded Apartment
    Thread Pool Worker Thread
    Interruptible

I go to search and find the source code of Task, it's version of 4.5, my code is run under .net 4.0. I find that the method 'InternalRunSynchronously' calls 'SpinThenBlockingWait', but the method of 'SpinThenBlockingWait' doesn't appear on the stacktrace of the dump. Is this method inline when running?

The code has running for more than one years. But just some day before the app hanged. The code in the update method is ok, I think. The parallel for will block the for loop until all the iterations complete, I know. Is it possible that the thread pool exhausted, then the parallel action need thread to exe, so the lock in the system call of task block the execution?

Update 1:

I output pool info and find that request queued.

    0:024> !threadpool
    CPU utilization: 81%
    Worker Thread: Total: 232 Running: 232 Idle: 0 MaxLimit: 400 MinLimit: 4
    Work Request in Queue: 480

0:164> !mlocks
Examining SyncBlocks...
Scanning for ReaderWriterLock instances...
Scanning for holders of ReaderWriterLock locks...
Scanning for ReaderWriterLockSlim instances...
Scanning for holders of ReaderWriterLockSlim locks...
Examining CriticalSections...

ClrThread  DbgThread  OsThread    LockType    Lock              LockLevel
------------------------------------------------------------------------------
.....
0x67       116        0x1e8       thinlock    000000014036a2b0  (recursion:0)
0xab       182        0x268       thinlock    00000001c0724188  (recursion:0)
0xa4       177        0x14cc      RWLockSlim  000000013ff0a358  Writer        
0xa4       177        0x14cc      thinlock    0000000140780278  (recursion:0)
.......

0:024> !dlk
Examining SyncBlocks...
Scanning for ReaderWriterLock instances...
Scanning for holders of ReaderWriterLock locks...
Scanning for ReaderWriterLockSlim instances...
Scanning for holders of ReaderWriterLockSlim locks...
Examining CriticalSections...
Scanning for threads waiting on SyncBlocks...
Scanning for threads waiting on ReaderWriterLock locks...
Scanning for threads waiting on ReaderWriterLocksSlim locks...
Scanning for threads waiting on CriticalSections...
No deadlocks detected.

No deadlock found.