Determine if an image file is a photo or a graphic

2019-03-28 13:19发布

问题:

I'm embarking on what I believe may be somewhat of an experiment...

To come up with (or discover, as it could already exist) a method to determine whether a given image file, regardless of format, is a photo or a graphic.

"Photo" meaning something like scenery, people, etc. V.S. "Graphic" meaning an icon, illustration, chart, UI screenshot, etc.

I came up with a nice PHP / ImageMagick script in the past week which pulls statistics from image files and nicely applies fixes to white balance, tone, vibrance, sharpness, shadows/highlights.

Now I'd like to take it a step further: Automatically detect photo content, then apply the aforementioned processing.

One method which has worked somewhat-consistently so far was to determine if the image had EXIF data, but this only works on JPEGs. This isn't foolproof of course, though.

Are there any known methods via ImageMagick, GD or otherwise for detecting a "photo" vs a "graphic"?

I do have the capability of installing/running applications besides ImageMagick & GD on our web server if need be.

Thanks!

回答1:

Photos tend to have a LOT of different individual colors in them (thousands, tenthousands and hundredthousands). Other graphics tend to rather use a limited number of unique colors (dozens up to a few hundred).

So an ImageMagick command may be able to help triaging a big number of files:

 identify -format '%k\n'        file
 identify -format '%f :  %k\n'  file1 file2 file3 file4

The special %k IM identify percent escape macro causes the counting and return of the number of unique colors in the identifyed file. Here are a few examples for my own local files:

 identify -format '%k' logo.png
    257

 identify -format '%k' testimage.png 
  20913

Running it against a set of 15 4032x3024-sized photos in a local directory yielded this result (taking more than 2 seconds per photo to count the colors):

time identify -format '%f :  %k\n' *.JPG
  P4061782.JPG :  285127
  P4061783.JPG :  304247
  P4061784.JPG :  230241
  P4061785.JPG :  277545
  P4061786.JPG :  300632
  P4061787.JPG :  325916
  P4061788.JPG :  301766
  P4061789.JPG :  300821
  P4061790.JPG :  265080
  P4061791.JPG :  348247
  P4101941.JPG :  323714
  P4101942.JPG :  359688
  P4101943.JPG :  338563
  P4101944.JPG :  308578
  P4101945.JPG :  291853

   real  0m34.257s
   user  0m33.301s
   sys   0m0.678s

Warning: sophisticated gradients produced with vector drawing applications, such as inkscape, may also produce lots of unique colors...