I've been working with OpenCV and Apple's Accelerate framework and find the performance of Accelerate to be slow and Apple's documentation limited. Let's take for example:
void equalizeHistogram(const cv::Mat &planar8Image, cv::Mat &equalizedImage)
{
cv::Size size = planar8Image.size();
vImage_Buffer planarImageBuffer = {
.width = static_cast<vImagePixelCount>(size.width),
.height = static_cast<vImagePixelCount>(size.height),
.rowBytes = planar8Image.step,
.data = planar8Image.data
};
vImage_Buffer equalizedImageBuffer = {
.width = static_cast<vImagePixelCount>(size.width),
.height = static_cast<vImagePixelCount>(size.height),
.rowBytes = equalizedImage.step,
.data = equalizedImage.data
};
TIME_START(VIMAGE_EQUALIZE_HISTOGRAM);
vImage_Error error = vImageEqualization_Planar8(&planarImageBuffer, &equalizedImageBuffer, kvImageNoFlags);
TIME_END(VIMAGE_EQUALIZE_HISTOGRAM);
if (error != kvImageNoError) {
NSLog(@"%s, vImage error %zd", __PRETTY_FUNCTION__, error);
}
}
This call takes roughly 20ms. Which has the practical meaning of being unusable in my application. Maybe equalization of the histogram is inherently slow, but I've also tested BGRA->Grayscale and found OpenCV can do it in ~5ms and vImage takes ~20ms.
In testing of other functions I found a project that made a simple slider app with a blur function (gist) that I cleaned up to test. Roughly ~20ms as well.
Is there some trick to getting these functions to be faster?
Don't keep re-allocating vImage_Buffer if you can avoid it.
One thing that is critical to vImage accelerate performance is the reuse of vImage_Buffers. I can't say how many times I read in Apple's limited documentation hints to this effect, but I was definitely not listening.
In the aforementioned blur code example, I reworked the test app to setup the vImage_Buffer input and output buffers once per image rather than once for each call to boxBlur. I dropped <10ms per call which made a noticeable difference in response time.
This says that Accelerate needs time to warm-up before you start seeing performance improvements. The first call to this method took 34ms.
To use vImage with OpenCV, pass a reference to your OpenCV matrix to a method like this one:
The call to this method, from your OpenCV code block, looks like this:
It's that simple, and since these are all pointer references, there's no "deep copying" of any kind. It's as fast and efficient as it can possibly be, all questions of context and other related performance-considerations aside (I can help you with those, too).
SIDENOTE: Did you know that you have to change the channel permutation when mixing OpenCV with vImage? If not, prior to calling any vImage functions on an OpenCV matrix, call:
Perform the same call, map and all, to return the image to the channel order proper for an OpenCV matrix.
To get 30 frames per second using the equalizeHistogram function, you must deinterleave the image (convert from ARGBxxxx to PlanarX) and equalize ONLY R(ed)G(reen)B(lue); if you equalize A(lpha), the frame rate will drop to at least 24.
Here is the code that does exactly what you want, as fast as you want:
}
Notice that I allocate the alpha channel, even though I perform nothing on it; that's simply because converting back and forth between ARGB8888 and Planar8 requires alpha-channel buffer allocation and reference. Same performance and quality enhancements, regardless.
Also note that I perform contrast stretching after converting the Planar8 buffers into a single ARGB8888 buffer; that's because it's faster than applying the function channel-by-channel, as I did with the histogram equalization function, and gets the same results as doing it individually (the contrast stretching function does not cause the same alpha-channel distortion as histogram equalization).