I am working on motion detection with non-static camera using opencv. I am using a pretty basic background subtraction and thresholding approach to get a broad sense of all that's moving in a sample video. After thresholding, I enlist all separable "patches" of white pixels, store them as independent components and color them randomly with red, green or blue. The image below shows this for a football video where all such components are visible.
I create rectangles over these detected components and I get this image:
So I can see the challenge here. I want to cluster all the "similar" and close-by components into a single entity so that the rectangles in the output image show a player moving as a whole (and not his independent limbs). I tried doing K-means clustering but since ideally I would not know the number of moving entities, I could not make any progress.
Please guide me on how I can do this. Thanks
I am not entirely sure if you are really looking for clustering (in the Data Mining sense).
Clustering is used to group similar objects according to a distance function. In your case the distance function would only use the spatial qualities. Besides, in k-means clustering you have to specify a k, that you probably don't know beforehand.
It seems to me you just want to merge all rectangles whose borders are closer together than some predetermined threshold. So as a first idea try to merge all rectangles that are touching or that are closer together than half a players height.
You probably want to include a size check to minimize the risk of merging two players into one.
Edit: If you really want to use a clustering algorithm use one that estimates the number of clusters for you.
I agree with Sebastian Schmitz: you probably shouldn't be looking for clustering.
Don't expect an uninformed method such as k-means to work magic for you. In particular one that is as crude a heuristic as k-means, and which lives in an idealized mathematical world, not in messy, real data.
You have a good understanding of what you want. Try to put this intuition into code. In your case, you seem to be looking for connected components.
Consider downsampling your image to a lower resolution, then rerunning the same process! Or running it on the lower resolution right away (to reduce compression artifacts, and improve performance). Or adding filters, such as blurring.
I'd expect best and fastest results by looking at connected components in the downsampled/filtered image.
this problem can be almost perfectly solved by dbscan clustering algorithm. Below, I provide the implementation and result image. Gray blob means outlier or noise according to dbscan. I simply used boxes as input data. Initially, box centers were used for distance function. However for boxes, it is insufficient to correctly characterize distance. So, the current distance function use the minimum distance of all 8 corners of two boxes.
I guess you can improve your original attempt by using morphological transformations. Take a look at http://docs.opencv.org/master/d9/d61/tutorial_py_morphological_ops.html#gsc.tab=0. Probably you can deal with a closed set for each entity after that, specially with separate players as you got in your original image.