I have a python image processing function, that uses tries to get the dominant color of an image. I make use of a function I found here https://github.com/tarikd/python-kmeans-dominant-colors/blob/master/utils.py
It works, but unfortunately I don't quite understand what it does and I learned that np.histogram
is rather slow and I should use cv2.calcHist
since it's 40x faster according to this: https://docs.opencv.org/trunk/d1/db7/tutorial_py_histogram_begins.html
I'd like to understand how I have to update the code to use cv2.calcHist
, or better, which values I have to input.
My function
def centroid_histogram(clt):
# grab the number of different clusters and create a histogram
# based on the number of pixels assigned to each cluster
num_labels = np.arange(0, len(np.unique(clt.labels_)) + 1)
(hist, _) = np.histogram(clt.labels_, bins=num_labels)
# normalize the histogram, such that it sums to one
hist = hist.astype("float")
hist /= hist.sum()
# return the histogram
return hist
The pprint
of clt
is this, not sure if this helps
KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
n_clusters=1, n_init=10, n_jobs=1, precompute_distances='auto',
random_state=None, tol=0.0001, verbose=0)
My code can be found here: https://github.com/primus852/python-movie-barcode
I am a very beginner, so any help is highly appreciated.
As per request:
Sample Image
Most dominant color:
rgb(22,28,37)
Computation time for the Histogram:
0.021515369415283203s
@Divakar has given a great answer. But if you want to port your own code to OpenCV, then:
Result for your image:
Time it took: 0.10798478126525879 secs
The equivalent code for
cv2.calcHist()
is to replace:with
Note that
cv2.calcHist
only acceptsuint8
andfloat32
as element type.Update
It seems like opencv's and numpy's binning differs from each other as the histograms differ if the number of bins doesn't map the value range:
Two approaches using
np.unique
andnp.bincount
to get the most dominant color could be suggested. Also, in the linked page, it talks aboutbincount
as a faster alternative, so that could be the way to go.Approach #1
Approach #2
Verification and timings on
1000 x 1000
color image in a dense range[0,9)
for reproducible results -Further boost
Further boost upon leveraging
multi-core
withnumexpr
module for large data -Timings -