-->

How to detect similar Images in PHP?

2020-06-06 02:30发布

问题:

I have many files of a same picture in various resolution, suitable for every devices like mobile, pc, psp etc. Now I am trying to display only unique pictures in the page, but I dont know how to. I could have avoided this if I maintained a database at the first place, but I didn't. And I need your help detecting the largest unique pictures.

回答1:

Well, even thou there are quite a few algorithms to do that, i believe it would still be faster to do that manually. Download all the images feed them into something like windows live photo gallery or any other software which could match similar images. This will take you few hours, but implementing image matching algorithm could take far more. After that you could spend extra time on amending your current system to store everything in a DB. Fix cause of the problem, not it's symptoms.



回答2:

Install gd2 and lib puzzle in your server.

Lib puzzle is astonishing and easy to play with it. Check this snippet

<?php
# Compute signatures for two images
$cvec1 = puzzle_fill_cvec_from_file('img1.jpg');
$cvec2 = puzzle_fill_cvec_from_file('img2.jpg');

# Compute the distance between both signatures
$d = puzzle_vector_normalized_distance($cvec1, $cvec2);

# Are pictures similar?
if ($d < PUZZLE_CVEC_SIMILARITY_LOWER_THRESHOLD) {
  echo "Pictures are looking similar\n";
} else {
  echo "Pictures are different, distance=$d\n";
}

# Compress the signatures for database storage
$compress_cvec1 = puzzle_compress_cvec($cvec1);
$compress_cvec2 = puzzle_compress_cvec($cvec2);


回答3:

Firstly, your problem has hardly anything to do with PHP, so I have removed that tag and added more relevant tags.


Smartly doing it will not require NxN comparisions. You can use lots of heuristics, but first I would like to ask you:

  1. Are all the copies of one image exact resize of each other (is there some cropping done - matching cropped images to the original could be more difficult and time consuming)?

  2. Are all images generated (resized) using the same tool?

  3. What about parameters you have used to resize? For example, are all pictures for displaying on PSP in the same resolution?

  4. What is your estimate of how many unique images you have (i.e, how many copies of each picture there might be - on an average)?

  5. Do you have any kind of categorization already done. For example, are all mobile images in separate folder (or of different resolution than the PC images)? This alone could reduce the number of comparisons a lot, even if you do brute force otherwise.

A very top level hint on why you don't need NxN comparisions: you can devise many different approximate hashes (for example, the distribution of high/low frequency jpeg coefficients) and group "potentially" similar images together. This can reduce the number of comparisions required by 10-100 times or even more depending on the quality of heuristic used and the data set. The hashing can even be done on parts of images. 30000 is not a very large number if you use right techniques.



回答4:

You should check which of the 2 images is the smallest, take the size of that and then compare only the pixels within the rectangle size.