I finally got VS2012 and got a simple demo up and working to check out the potential performance boost of async and await, but to my dismay it is slower! Its possible I'm doing something wrong, but maybe you can help me out. (I also added a simple Threaded solution, and that runs faster as expected)
My code uses a class to sum an array, based on the number of cores on your system (-1) Mine had 4 cores, so I saw about a 2x speed up (2.5 threads) for threading, but a 2x slow down for the same thing but with async/await.
Code: (Note you will need to added the reference to System.Management
to get the core detector working)
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Threading;
using System.Management;
using System.Diagnostics;
namespace AsyncSum
{
class Program
{
static string Results = "";
static void Main(string[] args)
{
Task t = Run();
t.Wait();
Console.WriteLine(Results);
Console.ReadKey();
}
static async Task Run()
{
Random random = new Random();
int[] huge = new int[1000000];
for (int i = 0; i < huge.Length; i++)
{
huge[i] = random.Next(2);
}
ArraySum summer = new ArraySum(huge);
Stopwatch sw = new Stopwatch();
sw.Restart();
long tSum = summer.Sum();
for (int i = 0; i < 100; i++)
{
tSum = summer.Sum();
}
long tticks = sw.ElapsedTicks / 100;
long aSum = await summer.SumAsync();
sw.Restart();
for (int i = 0; i < 100; i++)
{
aSum = await summer.SumAsync();
}
long aticks = sw.ElapsedTicks / 100;
long dSum = summer.SumThreaded();
sw.Restart();
for (int i = 0; i < 100; i++)
{
dSum = summer.SumThreaded();
}
long dticks = sw.ElapsedTicks / 100;
long pSum = summer.SumParallel();
sw.Restart();
for (int i = 0; i < 100; i++)
{
pSum = summer.SumParallel();
}
long pticks = sw.ElapsedTicks / 100;
Program.Results += String.Format("Regular Sum: {0} in {1} ticks\n", tSum, tticks);
Program.Results += String.Format("Async Sum: {0} in {1} ticks\n", aSum, aticks);
Program.Results += String.Format("Threaded Sum: {0} in {1} ticks\n", dSum, dticks);
Program.Results += String.Format("Parallel Sum: {0} in {1} ticks\n", pSum, pticks);
}
}
class ArraySum
{
int[] Data;
int ChunkSize = 1000;
int cores = 1;
public ArraySum(int[] data)
{
Data = data;
cores = 0;
foreach (var item in new System.Management.ManagementObjectSearcher("Select * from Win32_Processor").Get())
{
cores += int.Parse(item["NumberOfCores"].ToString());
}
cores--;
if (cores < 1) cores = 1;
ChunkSize = Data.Length / cores + 1;
}
public long Sum()
{
long sum = 0;
for (int i = 0; i < Data.Length; i++)
{
sum += Data[i];
}
return sum;
}
public async Task<long> SumAsync()
{
Task<long>[] psums = new Task<long>[cores];
for (int i = 0; i < psums.Length; i++)
{
int start = i * ChunkSize;
int end = start + ChunkSize;
psums[i] = Task.Run<long>(() =>
{
long asum = 0;
for (int a = start; a < end && a < Data.Length; a++)
{
asum += Data[a];
}
return asum;
});
}
long sum = 0;
for (int i = 0; i < psums.Length; i++)
{
sum += await psums[i];
}
return sum;
}
public long SumThreaded()
{
long sum = 0;
Thread[] threads = new Thread[cores];
long[] buckets = new long[cores];
for (int i = 0; i < cores; i++)
{
int start = i * ChunkSize;
int end = start + ChunkSize;
int bucket = i;
threads[i] = new Thread(new ThreadStart(() =>
{
long asum = 0;
for (int a = start; a < end && a < Data.Length; a++)
{
asum += Data[a];
}
buckets[bucket] = asum;
}));
threads[i].Start();
}
for (int i = 0; i < cores; i++)
{
threads[i].Join();
sum += buckets[i];
}
return sum;
}
public long SumParallel()
{
long sum = 0;
long[] buckets = new long[cores];
ParallelLoopResult lr = Parallel.For(0, cores, new Action<int>((i) =>
{
int start = i * ChunkSize;
int end = start + ChunkSize;
int bucket = i;
long asum = 0;
for (int a = start; a < end && a < Data.Length; a++)
{
asum += Data[a];
}
buckets[bucket] = asum;
}));
for (int i = 0; i < cores; i++)
{
sum += buckets[i];
}
return sum;
}
}
}
Any thoughts? Am I doing async/await wrong? I'll be happy to try any suggestions.
It's important to separate "asynchrony" from "parallelization".
await
is there to help make writing asynchronous code easier. Code that runs in parallel may (or may not) involve asynchrony, and code that is asynchronous may or may not run in parallel.Nothing about
await
is designed to make parallel code faster. The purpose ofawait
is to make writing asynchronous code easier, while minimizing the negative performance implications. Usingawait
won't ever be faster than correctly written non-await asynchronous code (although because writing correct code withawait
is easier, it will sometimes be faster because the programmer isn't capable of writing that asynchronous code correctly without await, or isn't willing to put the time in to do so. If the non-async code is written well it will perform about as well, if not a tad better, than theawait
code.C# does have support specifically for parallelization, it's just not specifically though
await
. The Task Parallel Library (TPL) as well as Parallel LINQ (PLINQ) have several very effective means of parallelizing code that is generally more efficient than naive threaded implementations.In your case, an effective implementation using PLINQ might be something like this:
Note that this will take care of efficiently partitioning the input sequence into chunks that will be run in parallel; it will take care of determining the appropriate size of chunks, and the number of concurrent workers, and it will appropriately aggregate the results of those workers in a manor that is both properly synchronized to ensure a correct result (unlike your threaded example) and efficient (meaning that it won't completely serialize all aggregation).
async
isn't intended for heavy-duty parallel computation. You can do basic parallel work usingTask.Run
withTask.WhenAll
, but any serious parallel work should be done using the task parallel library (e.g.,Parallel
). Asynchronous code on the client side is about responsiveness, not parallel processing.A common approach is to use
Parallel
for the parallel work, and then wrap it in aTask.Run
and useawait
on it to keep the UI responsive.On a quick look, the results are expected: your async sum is using just one thread, while you asynchronously wait for it to finish, so it's slower than the multi-threaded sum.
You'd use async in case you have something else to finish while it's doing its job. So, this wouldn't be the right test for any speed/response improvements.
Your benchmark has a couple of flaws:
class Task
, JIT-compilation etc.)DateTime.Now
, which is too inaccurate for timings in the millisecond range. You'll need to useStopWatch
With these two issues fixed; I get the following benchmark results:
Async now comes out as the fastest solution, taking less than 2ms.
This is the next problem: timing something as fast as 2ms is extremely unreliable; your thread can get paused for longer than that if some other process is using the CPU in the background. You should average the results over several thousands of benchmark runs.
Also, what's going on with your number of core detection? My quad-core is using a chunk size of 333334 which allows only 3 threads to run.