Image fingerprint to compare similarity of many im

I need to create fingerprints of many images (about 100.000 existing, 1000 new per day, RGB, JPEG, max size 800x800) to compare every image to every other image very fast. I can't use binary compare methods because also images which are nearly similar should be recognized.

Best would be an existing library, but also some hints to existing algorithms would help me a lot.

标签： image image-processing fingerprint

11条回答

乱世女痞

2楼-- · 2019-01-03 07:49

There is a much less ad-hoc approach than the scaled down image variants that have been proposed here that retains their general flavor, but which gives a much more rigorous mathematical basis for what is going on.

Take a Haar wavelet of the image. Basically the Haar wavelet is the succession of differences from the lower resolution images to each higher resolution image, but weighted by how deep you are in the 'tree' of mipmaps. The calculation is straightforward. Then once you have the Haar wavelet appropriately weighted, throw away all but the k largest coefficients (in terms of absolute value), normalize the vector and save it.

If you take the dot product of two of those normalized vectors it gives you a measure of similarity with 1 being nearly identical. I posted more information over here.

0人赞添加讨论(0) 举报

Rolldiameter

3楼-- · 2019-01-03 07:49

So you want to do "fingerprint matching" that's pretty different than "image matching". Fingerprints' analysis has been deeply studied during the past 20 years, and several interesting algorithms have been developed to ensure the right detection rate (with respect to FAR and FRR measures - False Acceptance Rate and False Rejection Rate).

I suggest you to better look to LFA (Local Feature Analysis) class of detection techniques, mostly built on minutiae inspection. Minutiae are specific characteristics of any fingerprint, and have been classified in several classes. Mapping a raster image to a minutiae map is what actually most of Public Authorities do to file criminals or terrorists.

See here for further references

0人赞添加讨论(0) 举报

smile是对你的礼貌

4楼-- · 2019-01-03 07:50

A long time ago I worked on a system that had some similar characteristics, and this is an approximation of the algorithm we followed:

Divide the picture into zones. In our case we were dealing with 4:3 resolution video, so we used 12 zones. Doing this takes the resolution of the source images out of the picture.
For each zone, calculate an overall color - the average of all pixels in the zone
For the entire image, calculate an overall color - the average of all zones

So for each image, you're storing n + 1 integer values, where n is the number of zones you're tracking.

For comparisons, you also need to look at each color channel individually.

For the overall image, compare the color channels for the overall colors to see if they are within a certain threshold - say, 10%
If the images are within the threshold, next compare each zone. If all zones also are within the threshold, the images are a strong enough match that you can at least flag them for further comparison.

This lets you quickly discard images that are not matches; you can also use more zones and/or apply the algorithm recursively to get stronger match confidence.

0人赞添加讨论(0) 举报

太酷不给撩

5楼-- · 2019-01-03 07:51

One way you can do this is to resize the image and drop the resolution significantly (to 200x200 maybe?), storing a smaller (pixel-averaged) version for doing the comparison. Then define a tolerance threshold and compare each pixel. If the RGB of all pixels are within the tolerance, you've got a match.

Your initial run through is O(n^2) but if you catalog all matches, each new image is just an O(n) algorithm to compare (you only have to compare it to each previously inserted image). It will eventually break down however as the list of images to compare becomes larger, but I think you're safe for a while.

After 400 days of running, you'll have 500,000 images, which means (discounting the time to resize the image down) 200(H)*200(W)*500,000(images)*3(RGB) = 60,000,000,000 comparisons. If every image is an exact match, you're going to be falling behind, but that's probably not going to be the case, right? Remember, you can discount an image as a match as soon as a single comparison falls outside your threshold.

0人赞添加讨论(0) 举报

啃猪蹄的小仙女

6楼-- · 2019-01-03 07:52

Normal hashing or CRC calculation algorithms do not work well with image data. The dimensional nature of the information must be taken into account.

If you need extremely robust fingerprinting, such that affine transformations (scaling, rotation, translation, flipping) are accounted for, you can use a Radon transformation on the image source to produce a normative mapping of the image data - store this with each image and then compare just the fingerprints. This is a complex algorithm and not for the faint of heart.

a few simple solutions are possible:

Create a luminosity histogram for the image as a fingerprint
Create scaled down versions of each image as a fingerprint
Combine technique (1) and (2) into a hybrid approach for improved comparison quality

A luminosity histogram (especially one that is separated into RGB components) is a reasonable fingerprint for an image - and can be implemented quite efficiently. Subtracting one histogram from another will produce a new historgram which you can process to decide how similar two images are. Histograms, because the only evaluate the distribution and occurrence of luminosity/color information handle affine transformations quite well. If you quantize each color component's luminosity information down to an 8-bit value, 768 bytes of storage are sufficient for the fingerprint of an image of almost any reasonable size. Luminosity histograms produce false negatives when the color information in an image is manipulated. If you apply transformations like contrast/brightness, posterize, color shifting, luminosity information changes. False positives are also possible with certain types of images ... such as landscapes and images where a single color dominates others.

Using scaled images is another way to reduce the information density of the image to a level that is easier to compare. Reductions below 10% of the original image size generally lose too much of the information to be of use - so an 800x800 pixel image can be scaled down to 80x80 and still provide enough information to perform decent fingerprinting. Unlike histogram data, you have to perform anisotropic scaling of the image data when the source resolutions have varying aspect ratios. In other words, reducing a 300x800 image into an 80x80 thumbnail causes deformation of the image, such that when compared with a 300x500 image (that's very similar) will cause false negatives. Thumbnail fingerprints also often produce false negatives when affine transformations are involved. If you flip or rotate an image, its thumbnail will be quite different from the original and may result in a false positive.

Combining both techniques is a reasonable way to hedge your bets and reduce the occurence of both false positives and false negatives.

0人赞添加讨论(0) 举报

做自己的国王

7楼-- · 2019-01-03 07:53

For iPhone image comparison and image similarity development check out: http://sites.google.com/site/imagecomparison/

To see it in action, check out eyeBuy Visual Search on the iTunes AppStore.

0人赞添加讨论(0) 举报

1 2 下一页

Image fingerprint to compare similarity of many im

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间