will .net Parallel Task exhaust all the threads in the pool the cause dead lock, the app hanged, incoming request can't be processed?
My asp.net app hanged. So I scratched a dump. I use DebugDiag to analyze. Dump analyzing is below:
87.40% of threads blocked (229 threads)
Total Threads: 232
Running Threads: 232
Idle Threads: 0
Max Threads: 400
Min Threads: 4
DebugDiag report is showing:
The following threads in w3wp.DMP are waiting in a WaitOne
( 20 21 22 28 30 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 215 216 217 218 219 220 221 222 223 224 225 226 227 229 230 231 232 233 234 235 236 237 238 239 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 257 258 259 260 )
87.40% of threads blocked (229 threads)
I found that a thread is got the "write lock", parallel action is started whiling locking. But in the parallel task, system method call the "Monitor.ObjWait" the write lock thread is blocked.
Thread 177:
[[HelperMethodFrame_1OBJ] (System.Threading.Monitor.ObjWait)] System.Threading.Monitor.ObjWait(Boolean, Int32, System.Object)
mscorlib_ni!System.Threading.ManualResetEventSlim.Wait(Int32, System.Threading.CancellationToken)+495
mscorlib_ni!System.Threading.Tasks.Task.InternalRunSynchronously(System.Threading.Tasks.TaskScheduler)+14a
System.Linq.Parallel.SpoolingTask.SpoolForAll[[System.__Canon, mscorlib],[System.Int32, mscorlib]](System.Linq.Parallel.QueryTaskGroupState, System.Linq.Parallel.PartitionedStream`2, System.Threading.Tasks.TaskScheduler)+ec
System.Linq.Parallel.MergeExecutor`1[[System.__Canon, mscorlib]].Execute[[System.Int32, mscorlib]](System.Linq.Parallel.PartitionedStream`2, Boolean, System.Linq.ParallelMergeOptions, System.Threading.Tasks.TaskScheduler, Boolean, System.Linq.Parallel.CancellationState, Int32)+27b
System.Linq.Parallel.PartitionedStreamMerger`1[[System.__Canon, mscorlib]].Receive[[System.Int32, mscorlib]](System.Linq.Parallel.PartitionedStream`2)+86
System.Linq.Parallel.ForAllOperator`1[[System.__Canon, mscorlib]].WrapPartitionedStream[[System.Int32, mscorlib]](System.Linq.Parallel.PartitionedStream`2, System.Linq.Parallel.IPartitionedStreamRecipient`1, Boolean, System.Linq.Parallel.QuerySettings)+21f
[[StubHelperFrame]]
System.Linq.Parallel.UnaryQueryOperator`2+UnaryQueryOperatorResults+ChildResultsRecipient[[System.__Canon, mscorlib],[System.__Canon, mscorlib]].Receive[[System.Int32, mscorlib]](System.Linq.Parallel.PartitionedStream`2)+130
System_Core_ni!System.Linq.Parallel.UnaryQueryOperator`2+UnaryQueryOperatorResults[[System.__Canon, mscorlib],[System.__Canon, mscorlib]].GivePartitionedStream(System.Linq.Parallel.IPartitionedStreamRecipient`1)+34f
System_Core_ni!System.Linq.Parallel.QueryOperator`1[[System.__Canon, mscorlib]].GetOpenedEnumerator(System.Nullable`1, Boolean, Boolean, System.Linq.Parallel.QuerySettings)+2d4
System_Core_ni!System.Linq.Parallel.ForAllOperator`1[[System.__Canon, mscorlib]].RunSynchronously()+319
Info.UpdateCache(System.Collections.Generic.List`1, System.Collections.Generic.List`1, MySetting)+e2
Info.GetInfo(System.Collections.Generic.List`1, MySetting)+4f
Many other threads try to get a read lock, but the write lock is not released, these threads are blocked.
[[HelperMethodFrame_1OBJ] (System.Threading.WaitHandle.WaitOneNative)] System.Threading.WaitHandle.WaitOneNative(System.Runtime.InteropServices.SafeHandle, UInt32, Boolean, Boolean)
mscorlib_ni!System.Threading.WaitHandle.InternalWaitOne(System.Runtime.InteropServices.SafeHandle, Int64, Boolean, Boolean)+14
System_Core_ni!System.Threading.ReaderWriterLockSlim.WaitOnEvent(System.Threading.EventWaitHandle, UInt32 ByRef, Int32)+a8
System_Core_ni!System.Threading.ReaderWriterLockSlim.TryEnterWriteLockCore(Int32)+612861
System_Core_ni!System.Threading.ReaderWriterLockSlim.TryEnterWriteLock(Int32)+28
Info.UpdateCache(System.Collections.Generic.List`1, System.Collections.Generic.List`1, )+5f
Info.GetInfo(System.Collections.Generic.List`1, MySetting)+4f
I go to check the code. GetInfo is triggered by request, the first request will get data from a soa service and update the local cache, then the other requests just get data from the local cache.
MyStaticInfo StaticInfo = Instance.GetInfo(new List<int>
{
1,2,3,4,5.......
}, new MySetting
{
getInfo=true,
extrainfo = true
});
public MyStaticInfo GetInfo(List<int> IDList, MySetting setting)
{
.....
MyStaticInfo requestSoaEntity = this.CreateSoaRequest(IDList, setting);
MyStaticInfo soaData = this.GetSoaData(requestSoaEntity); //no lock in the method.
if (soaData != null)
{
this.UpdateCache(soaData, IDList, setting);
}
......
}
private MyStaticInfo CreateSoaRequest(List<int> IDList, MySetting setting)
{
this.cacheLock.EnterReadLock();
MyStaticInfo result;
try
{
IDList.AsParallel<int>().ForAll(delegate(int ID)
{
......
result=....
});
}
finally
{
this.cacheLock.ExitReadLock();
}
return result;
}
private void UpdateCache(MyStaticInfo responseSoa, List<int> IDList, MySetting setting)
{
this.cacheLock.EnterWriteLock();
try
{
IDList.AsParallel<int>().ForAll(delegate(int ID)
{
......
});
if (responseSoa != null)
{
responseSoa.AsParallel().ForAll( soa=>
{
........
});
}
}
finally
{
this.cacheLock.ExitWriteLock();
}
}
I use windbg to analyze the dead lock, but it seems there is no dead lock.
0:253> !syncblk
Index SyncBlock MonitorHeld Recursion Owning Thread Info SyncBlock Owner
-----------------------------
Total 278
CCW 12
RCW 2
ComClassFactory 0
Free 209
0:253> !threads
ThreadCount: 244
UnstartedThread: 0
BackgroundThread: 244
PendingThread: 0
DeadThread: 0
Hosted Runtime: no
30 11 1970 0000000004cf00b0 3809220 Enabled 0000000000000000:0000000000000000 00000000025c3c90 1 MTA (Threadpool Worker)
32 12 2a18 0000000004cf07c0 3809220 Enabled 0000000000000000:0000000000000000 00000000025c3c90 1 MTA (Threadpool Worker)
33 13 255c 0000000004cf0ed0 3809220 Enabled 0000000000000000:0000000000000000 00000000025c3c90 1 MTA (Threadpool Worker)
34 14 12fc 0000000004cf15e0 3009220 Enabled 0000000000000000:0000000000000000 00000000025c3c90 1 MTA (Threadpool Worker)
35 15 283c 0000000004cf1cf0 3009220 Enabled 0000000000000000:0000000000000000 00000000025c3c90 1 MTA (Threadpool Worker)
36 16 2e94 0000000004cf2400 3809220 Enabled 0000000000000000:0000000000000000 00000000025c3c90 1 MTA (Threadpool Worker)
37 17 1c6c 0000000004cf2b10 3809220 Enabled 0000000000000000:0000000000000000 00000000025c3c90 1 MTA (Threadpool Worker)
38 18 2d5c 0000000004cf3220 3009220 Enabled 0000000000000000:0000000000000000 00000000025c3c90 1 MTA (Threadpool Worker)
A lot of threads in the state of 3009220, Is the state ok?
0:253> !ThreadState 3009220
Legal to Join
Background
CLR Owns
In Multi Threaded Apartment
Thread Pool Worker Thread
Interruptible
I go to search and find the source code of Task, it's version of 4.5, my code is run under .net 4.0. I find that the method 'InternalRunSynchronously' calls 'SpinThenBlockingWait', but the method of 'SpinThenBlockingWait' doesn't appear on the stacktrace of the dump. Is this method inline when running?
The code has running for more than one years. But just some day before the app hanged. The code in the update method is ok, I think. The parallel for will block the for loop until all the iterations complete, I know. Is it possible that the thread pool exhausted, then the parallel action need thread to exe, so the lock in the system call of task block the execution?
Update 1:
I output pool info and find that request queued.
0:024> !threadpool
CPU utilization: 81%
Worker Thread: Total: 232 Running: 232 Idle: 0 MaxLimit: 400 MinLimit: 4
Work Request in Queue: 480
0:164> !mlocks
Examining SyncBlocks...
Scanning for ReaderWriterLock instances...
Scanning for holders of ReaderWriterLock locks...
Scanning for ReaderWriterLockSlim instances...
Scanning for holders of ReaderWriterLockSlim locks...
Examining CriticalSections...
ClrThread DbgThread OsThread LockType Lock LockLevel
------------------------------------------------------------------------------
.....
0x67 116 0x1e8 thinlock 000000014036a2b0 (recursion:0)
0xab 182 0x268 thinlock 00000001c0724188 (recursion:0)
0xa4 177 0x14cc RWLockSlim 000000013ff0a358 Writer
0xa4 177 0x14cc thinlock 0000000140780278 (recursion:0)
.......
0:024> !dlk
Examining SyncBlocks...
Scanning for ReaderWriterLock instances...
Scanning for holders of ReaderWriterLock locks...
Scanning for ReaderWriterLockSlim instances...
Scanning for holders of ReaderWriterLockSlim locks...
Examining CriticalSections...
Scanning for threads waiting on SyncBlocks...
Scanning for threads waiting on ReaderWriterLock locks...
Scanning for threads waiting on ReaderWriterLocksSlim locks...
Scanning for threads waiting on CriticalSections...
No deadlocks detected.
No deadlock found.