I am working on a program which manipulates images of different sizes. Many of these manipulations read pixel data from an input and write to a separate output (e.g. blur). This is done on a per-pixel basis.
Such image mapulations are very stressful on the CPU. I would like to use multithreading to speed things up. How would I do this? I was thinking of creating one thread per row of pixels.
I have several requirements:
- Executable size must be minimized. In other words, I can't use massive libraries. What's the most light-weight, portable threading library for C/C++?
- Executable size must be minimized. I was thinking of having a function forEachRow(fp* ) which runs a thread for each row, or even a forEachPixel(fp* ) where fp operates on a single pixel in its own thread. Which is best?
- Should I use normal functions or functors or functionoids or some lambda functions or ... something else?
- Some operations use optimizations which require information from the previous pixel processed. This makes forEachRow favorable. Would using forEachPixel be better even considering this?
- Would I need to lock my read-only and write-only arrays?
- The input is only read from, but many operations require input from more than one pixel in the array.
- The ouput is only written once per pixel.
- Speed is also important (of course), but optimize executable size takes precedence.
Thanks.
More information on this topic for the curious: C++ Parallelization Libraries: OpenMP vs. Thread Building Blocks
It is very possible, that bottleneck is not CPU but memory bandwidth, so multi-threading WON'T help a lot. Try to minimize memory access and work on limited memory blocks, so that more data can be cached. I had a similar problem a while ago and I decided to optimize my code to use SSE instructions. Speed increase was almost 4x per single thread!
Can I ask which platform you're writing this for? I'm guessing that because executable size is an issue you're not targetting on a desktop machine. In which case does the platform have multiple cores or hyperthreaded? If not then adding threads to your application could have the opposite effect and slow it down...
Your compiler doesn't support OpenMP. Another option is to use a library approach, both Intel's Threading Building Blocks and Microsoft Concurrency Runtime are available (VS 2010).
There is also a set of interfaces called the Parallel Pattern Library which are supported by both libraries and in these have a templated parallel_for library call. so instead of:
you would write:
where ToGrayScale is a functor or pointer to function. (Note if your compiler supports lambda expressions which it likely doesn't you can inline the functor as a lambda expression).
-Rick
I would recommend
boost::thread
andboost::gil
(generic image libray). Because there are quite much templates involved, I'm not sure whether the code-size will still be acceptable for you. But it's part of boost, so it is probably worth a look.To optimize simple image transformations, you are far better off using SIMD vector math than trying to multi-thread your program.
You also could use libraries like IPP or the Cassandra Vision C++ API that are mostly much more optimized than you own code.