When reading the IPTC data from an image, UTF-8 accented characters are not displaying properly when reading them via PHP.
For example: é, ø and ü
With a header content-type set as UTF8, instead of the character, I get the question mark in a black diamond. � If no content-type is set, then I get a dash character: —
The following is the code being used to read the IPTC block:
$file = '/path/to/image.jpg';
getimagesize($file, $info);
$iptc = iptcparse($info['APP13']);
I have also tried uploading the exact same image to a WordPress installation on the same server, and it properly strips the accented character and replaces it with it's basic latin equivalent. I don't mind if this is the end result, I would just like to read the characters properly.
Any ideas on how to get the complete and correct data from the image?
Answering a bit late, but since I had the same problem displaying special characters as č š ž
(which appear in Slovenian alphabet) I may aswell answer for future reference.
Solution to this problem actually is not related to php, but to the IPTC data encoding. By default most software that can write IPTC data will store it in plain ASCII. At first I've used Adobe Bridge - which actually displays all special characters as it should when you start tagging your images - but once you want to parse that data in PHP you will actually not see special characters. (I would have to check again this part, but the main catch is that two different encodings happen - one that encodes IPTC data on the image and one that displays that data in a program that can handle IPTC data - or something along this lines).
To solve the problem I used a program called ExifTool which is an amazing piece of software and will let you manage almost any data on your image.
Than I used it to convert all IPTC encodings to UTF-8 - and from then on I just had to retag images that had corrupt characters (which Adobe Bridge correctly displays but obviously does not save in correct encoding).
The command to accomplish this on all images in a folder is:
exiftool -tagsfromfile @ -iptc:all -codedcharacterset=utf8
You may also want to download ExifTool GUI if you are not familiar working from cmd.
I haven't found any better program that could accomplish this same task faster.
to set charset to utf8, use this code:
$iptc = array(
'1#090' => "\x1B%G" //utf8
);
change that part of code like this:
// Convert the IPTC tags into binary code
$data = '';
foreach($iptc as $tag => $string)
{
$rec = substr($tag, 0,1);
$tag = substr($tag, 2);
$data .= iptc_make_tag($rec, $tag, $string);
}
// Embed the IPTC data
$content = iptcembed($data, $path);