I am trying to implement histogram in Neon. Is it possible to vectorise ?
相关问题
- How to get the background from multiple images by
- Very low p-values in Python Kolmogorov-Smirnov Goo
- Try to load image with Highgui.imread (OpenCV + An
- CV2 Image Error: error: (-215:Assertion failed) !s
- How do I apply a perspective transform with more t
相关文章
- How do I append metadata to an image in Matlab?
- parallelizing matrix multiplication through thread
- socket() returns 0 in C client server application
- Select unique/deduplication in SSE/AVX
- SIMD/SSE: How to check that all vector elements ar
- Python open jp2 medical images - Scipy, glymur
- On a 64 bit machine, can I safely operate on indiv
- Converting PIL Image to GTK Pixbuf
Histogramming is almost impossible to vectorize, unfortunately.
You can probably optimise the scalar code somewhat however - a common trick is to use two histograms and then combine them at the end. This allows you to overlap loads/increments/stores and thereby bury some of the serial dependencies and associated latencies. Pseudo code:
ermig1979 has a Simd project which shows how he has done histograms using a similar approach to what @Paul-R has mentioned but also with SSE2 and AVX2 variants:
Project: https://github.com/ermig1979/Simd
Base file: https://github.com/ermig1979/Simd/blob/master/src/Simd/SimdBaseHistogram.cpp
An AVX2 implementation can be seen here: https://github.com/ermig1979/Simd/blob/master/src/Simd/SimdAvx2Histogram.cpp
A scalar solution can be seen below to illustrate the basic principle of creating multiple histograms that are summed at the end:
Some image processing algorithm working on histograms (e.g. equalization, histogram matching) can be made work with known percentiles -- and for an approximation one can effectively parallelize the search to initial ranges (0,25,50,75,100%) consuming 4 accumulators.
Each item in the input stream must be compared in parallel to all slots, incrementing the frequency. After a certain number of rounds (e.g. n*255 rounds guaranteeing no overflows on uint8_t data type, then accumulating those to uint16_t) the min/max limits in each slot are recalculated based on linear interpolation. And it's of course possible to re-run the sequence based on an estimation how much the new data has changed the estimates of the percentiles.
The algorithm would be variant to evaluation order, which could be mitigated by random sampling and multiple passes.
@Paul-R, there exists some paper at this link which discusses how to vectorize histogram functions:
SIMD Vectorization of Histogram Functions