I have a subroutine that processes large blocks of information. In order to make use of the entire CPU, it divides the work up into separate threads. After all threads have completed, it finishes. I read that creating and destroying threads uses lots of overhead, so I tried using the threadpool, but that actually runs slower than creating my own threads. How can I create my own threads when the program runs and then keep reusing them? I've seen some people say it can't be done, but the threadpool does it so it must be possible, right?
Here is part of the code that launches new threads / uses the threadpool:
//initialization for threads
Thread[] AltThread = null;
if (NumThreads > 1)
AltThread = new Thread[pub.NumThreads - 1];
do
{
if (NumThreads > 1)
{ //split the matrix up into NumThreads number of even-sized blocks and execute on separate threads
int ThreadWidth = DataWidth / NumThreads;
if (UseThreadPool) //use threadpool threads
{
for (int i = 0; i < NumThreads - 1; i++)
{
ThreadPool.QueueUserWorkItem(ComputePartialDataOnThread,
new object[] { AltEngine[i], ThreadWidth * (i + 1), ThreadWidth * (i + 2) });
}
//get number of threads available after queue
System.Threading.Thread.Sleep(0);
int StartThreads, empty, EndThreads;
ThreadPool.GetAvailableThreads(out StartThreads, out empty);
ComputePartialData(ThisEngine, 0, ThreadWidth);
//wait for all threads to finish
do
{
ThreadPool.GetAvailableThreads(out EndThreads, out empty);
System.Threading.Thread.Sleep(1);
} while (StartThreads - EndThreads > 0);
}
else //create new threads each time (can we reuse these?)
{
for (int i = 0; i < NumThreads - 1; i++)
{
AltThread[i] = new Thread(ComputePartialDataOnThread);
AltThread[i].Start(new object[] { AltEngine[i], ThreadWidth * (i + 1), ThreadWidth * (i + 2) });
}
ComputePartialData(ThisEngine, 0, ThreadWidth);
//wait for all threads to finish
foreach (Thread t in AltThread)
t.Join(1000);
foreach (Thread t in AltThread)
if (t.IsAlive) t.Abort();
}
}
}
ComputePartialDataOnThread simply unpackages the information and calls ComputePartialData. The data that will be processed is shared among the threads (they don't try to read/write the same locations). AltEngine[] is a separate computation engine for each thread.
The operation runs about 10-20% using the threadpool.