I would like to decide if an image is present in a list stored in a DB (e.g. pictures of IDs, passport, Stu. card, etc). I thought about using a KNN algorithm, that will plot the K closest images.
Options for distance metric:
- sum of Euclidean distance between each relative pixels (img1[pixel_i], img2[pixel_i])
- sum of Euclidean distance betwen each pixel to each other, multiplied by some factor decreasing with distance (pixel to pixel)
- same as above, but with manhattan...
Do you know/think of a better way to deal with the image similarity subject?
I think that using raw graylevel values in computing distances is a very bad idea. This is not invariant to illumination, to translation and to rotation (although I don't think that rotation is a big issue in face images).
Try to use some robust and invariant descriptor extracted from each image (e.g. SIFT on keypoints) and then compute distances between those features. K-NN could work. Alternatively, look for image retrieval literature for more advanced approaches.
Hope this helps!
If you have a large number of images in your database, it will get rather unwieldy calculating the similarity between a given image and every single image in your database every time. Instead, I would consider something like a Perceptual Hash (pHash) where you could pre-compute a parameter ONCE for each image in your database and store it, and then , when you want to compare an image you calculate just its single pHash and compare that with all the stored ones in your database.