making IPTC data searchable

2019-02-21 01:14发布

问题:

I have a question about IPTC metadata. Is it possible to search images that aren't in a database by their IPTC metadata (keywords) and show them and how would I go about doing this? I just need a basic idea.

I know there is the iptcparse() function for PHP.

I have already written a function to grab the image name, location, and extension for all images within a galleries folder and all subdirectories by .jpg extension.

I need to figure out how to extract the metadata without storing it in a database and how to search through it, grab the relevant images that match the search tag (their IPTC keywords should match) and how to display them. I know at the point that I have the final results (post search) i can echo an imagetag with src="$filelocation"> if i have the final results in an array.

Basically, I am not sure if I need to store all my images into a mysql database and also extract the keywords and store them in the database as well before I can actually search and display the results. Also, if you could guide me to any gallery that already is able to do this, that could help as well.

Thanks for any help regarding this issue.

回答1:

It is not clear what in particular is giving you problems, but perhaps this will give you some ideas:

<?php
# Images we're searching
$images = array('/path/to/image.jpg', 'another-image.jpg');

# IPTC keywords to values (from exiv2, see below)
$query = array('Byline' => 'Some Author');

# Perform the search
$result = select_jpgs_by_iptc_fields($images, $query);

# Display the results
foreach ($result as $path) {
    echo '<img src="', htmlspecialchars($path), '">';
}

function select_jpgs_by_iptc_fields($jpgs, $query) {
    $matches = array();
    foreach ($jpgs as $path) {
        $iptc = get_jpg_iptc_metadata($path);
        foreach ($query as $name => $values) {
            if (!is_array($values))
                $values = array($values);
            if (count(array_intersect($iptc[$name], $values)) != count($values))
                continue 2;
        }
        $matches[] = $path;
    }
    return $matches;
}

function get_jpg_iptc_metadata($path) {
    $size = getimagesize($path, $info);
    if(isset($info['APP13']))
    {
        return human_readable_iptc(iptcparse($info['APP13']));
    }
    else {
        return null;
    }
}

function human_readable_iptc($iptc) {
# From the exiv2 sources
static $iptc_codes_to_names =
array(    
// IPTC.Envelope-->
"1#000" => 'ModelVersion',
"1#005" => 'Destination',
"1#020" => 'FileFormat',
"1#022" => 'FileVersion',
"1#030" => 'ServiceId',
"1#040" => 'EnvelopeNumber',
"1#050" => 'ProductId',
"1#060" => 'EnvelopePriority',
"1#070" => 'DateSent',
"1#080" => 'TimeSent',
"1#090" => 'CharacterSet',
"1#100" => 'UNO',
"1#120" => 'ARMId',
"1#122" => 'ARMVersion',
// <-- IPTC.Envelope
// IPTC.Application2 -->
"2#000" => 'RecordVersion',
"2#003" => 'ObjectType',
"2#004" => 'ObjectAttribute',
"2#005" => 'ObjectName',
"2#007" => 'EditStatus',
"2#008" => 'EditorialUpdate',
"2#010" => 'Urgency',
"2#012" => 'Subject',
"2#015" => 'Category',
"2#020" => 'SuppCategory',
"2#022" => 'FixtureId',
"2#025" => 'Keywords',
"2#026" => 'LocationCode',
"2#027" => 'LocationName',
"2#030" => 'ReleaseDate',
"2#035" => 'ReleaseTime',
"2#037" => 'ExpirationDate',
"2#038" => 'ExpirationTime',
"2#040" => 'SpecialInstructions',
"2#042" => 'ActionAdvised',
"2#045" => 'ReferenceService',
"2#047" => 'ReferenceDate',
"2#050" => 'ReferenceNumber',
"2#055" => 'DateCreated',
"2#060" => 'TimeCreated',
"2#062" => 'DigitizationDate',
"2#063" => 'DigitizationTime',
"2#065" => 'Program',
"2#070" => 'ProgramVersion',
"2#075" => 'ObjectCycle',
"2#080" => 'Byline',
"2#085" => 'BylineTitle',
"2#090" => 'City',
"2#092" => 'SubLocation',
"2#095" => 'ProvinceState',
"2#100" => 'CountryCode',
"2#101" => 'CountryName',
"2#103" => 'TransmissionReference',
"2#105" => 'Headline',
"2#110" => 'Credit',
"2#115" => 'Source',
"2#116" => 'Copyright',
"2#118" => 'Contact',
"2#120" => 'Caption',
"2#122" => 'Writer',
"2#125" => 'RasterizedCaption',
"2#130" => 'ImageType',
"2#131" => 'ImageOrientation',
"2#135" => 'Language',
"2#150" => 'AudioType',
"2#151" => 'AudioRate',
"2#152" => 'AudioResolution',
"2#153" => 'AudioDuration',
"2#154" => 'AudioOutcue',
"2#200" => 'PreviewFormat',
"2#201" => 'PreviewVersion',
"2#202" => 'Preview',
// <--IPTC.Application2
      );
   $human_readable = array();
   foreach ($iptc as $code => $field_value) {
       $human_readable[$iptc_codes_to_names[$code]] = $field_value;
   }
   return $human_readable;
}


回答2:

If you don't have extracted those IPTC data from your images, each time someone will search, you'll have to :

  • loop on every images
  • for each image, extract the IPTC data
  • see if the IPTC data for the current image matches

If you have more than a couple image, this will be really bad for performances, I'd say.


So, in my opinion, it would be far better to :

  • add a couple of fields in your database
  • extract the relevant IPTC data when the image is uploaded / stored
  • store the IPTC data in those DB fields
  • search in those DB fields
    • Or use some search engine like Lucene or Sphinx -- but that is another problem.

It'll mean a bit more work for you right now : you have more code to write...

... But it also means your website will have better chances to survive when there are several images and many users doing searches.