Speed up loop using multithreading in C# (Question

2020-02-08 02:23发布

问题:

Imagine I have an function which goes through one million/billion strings and checks smth in them.

f.ex:

foreach (String item in ListOfStrings)
{
    result.add(CalculateSmth(item));
}

it consumes lot's of time, because CalculateSmth is very time consuming function.

I want to ask: how to integrate multithreading in this kinda process?

f.ex: I want to fire-up 5 threads and each of them returns some results, and thats goes-on till the list has items.

Maybe anyone can show some examples or articles..

Forgot to mention I need it in .NET 2.0

回答1:

You could try the Parallel extensions (part of .NET 4.0)

These allow you to write something like:

Parallel.Foreach (ListOfStrings, (item) => 
    result.add(CalculateSmth(item));
);

Of course result.add would need to be thread safe.



回答2:

The Parallel extensions is cool, but this can also be done just by using the threadpool like this:

using System.Collections.Generic;
using System.Threading;

namespace noocyte.Threading
{
    class CalcState
    {
        public CalcState(ManualResetEvent reset, string input) {
            Reset = reset;
            Input = input;
        }
        public ManualResetEvent Reset { get; private set; }
        public string Input { get; set; }
    }

    class CalculateMT
    {
        List<string> result = new List<string>();
        List<ManualResetEvent> events = new List<ManualResetEvent>();

        private void Calc() {
            List<string> aList = new List<string>();
            aList.Add("test");

            foreach (var item in aList)
            {
                CalcState cs = new CalcState(new ManualResetEvent(false), item);
                events.Add(cs.Reset);
                ThreadPool.QueueUserWorkItem(new WaitCallback(Calculate), cs);
            }
            WaitHandle.WaitAll(events.ToArray());
        }

        private void Calculate(object s)
        {
            CalcState cs = s as CalcState;
            cs.Reset.Set();
            result.Add(cs.Input);
        }
    }
}


回答3:

Note that concurrency doesn't magically give you more resource. You need to establish what is slowing CalculateSmth down.

For example, if it's CPU-bound (and you're on a single core) then the same number of CPU ticks will go to the code, whether you execute them sequentially or in parallel. Plus you'd get some overhead from managing the threads. Same argument applies to other constraints (e.g. I/O)

You'll only get performance gains in this if CalculateSmth is leaving resource free during its execution, that could be used by another instance. That's not uncommon. For example, if the task involves IO followed by some CPU stuff, then process 1 could be doing the CPU stuff while process 2 is doing the IO. As mats points out, a chain of producer-consumer units can achieve this, if you have the infrastructure.



回答4:

You need to split up the work you want to do in parallel. Here is an example of how you can split the work in two:

List<string> work = (some list with lots of strings)

// Split the work in two
List<string> odd = new List<string>();
List<string> even = new List<string>();
for (int i = 0; i < work.Count; i++)
{
    if (i % 2 == 0)
    {
        even.Add(work[i]);
    }
    else
    {
        odd.Add(work[i]);
    }
}

// Set up to worker delegates
List<Foo> oddResult = new List<Foo>();
Action oddWork = delegate { foreach (string item in odd) oddResult.Add(CalculateSmth(item)); };

List<Foo> evenResult = new List<Foo>();
Action evenWork = delegate { foreach (string item in even) evenResult.Add(CalculateSmth(item)); };

// Run two delegates asynchronously
IAsyncResult evenHandle = evenWork.BeginInvoke(null, null);
IAsyncResult oddHandle = oddWork.BeginInvoke(null, null);

// Wait for both to finish
evenWork.EndInvoke(evenHandle);
oddWork.EndInvoke(oddHandle);

// Merge the results from the two jobs
List<Foo> allResults = new List<Foo>();
allResults.AddRange(oddResult);
allResults.AddRange(evenResult);

return allResults;


回答5:

The first question you must answer is whether you should be using threading

If your function CalculateSmth() is basically CPU-bound, i.e. heavy in CPU-usage and basically no I/O-usage, then I have a hard time seeing the point of using threads, since the threads will be competing over the same resource, in this case the CPU.

If your CalculateSmth() is using both CPU and I/O, then it might be a point in using threading.

I totally agree with the comment to my answer. I made a erroneous assumption that we were talking about a single CPU with one core, but these days we have multi-core CPUs, my bad.



回答6:

Not that I have any good articles here right now, but what you want to do is something along Producer-Consumer with a Threadpool.

The Producers loops through and creates tasks (which in this case could be to just queue up the items in a List or Stack). The Consumers are, say, five threads that reads one item off the stack, consumes it by calculating it, and then stores it else where.

This way the multithreading is limited to just those five threads, and they will all have work to do up until the stack is empty.

Things to think about:

  • Put protection on the input and output list, such as a mutex.
  • If the order is important, make sure that the output order is maintained. One example could be to store them in a SortedList or something like that.
  • Make sure that the CalculateSmth is thread safe, that it doesn't use any global state.