I have an array of countries with one having a Latin character "Å":
$country["af"] = "Afghanistan";
$country["ax"] = "Åland Islands";
$country["al"] = "Albania";
While looping through this array and performing a comparison of the first character of the country name, I cannot match the Latin character.
foreach($country as $cc => $name)
{
if($name[0] == "Å")
{
echo "matched";
}
else
{
echo $name[0];
}
}
The result I got is: A�A
Why does the Latin character Å became � and how do I perform a proper comparison and output the Latin character Å?
Add Note: The http header and the html document have already been specified as UTF-8 format.
Add Note2: If I just echo $name
instead of $name[0]
, I am able to get the Å in Åland Islands. Using substr($name, 0, 1)
has the same effect as $name[0]
, which gives me �.
Change your script to this. The unicode encoding words cannot explode with normal string functions. You have to use multibyte functions.
foreach($country as $cc => $name)
{
if(mb_substr($name,0,1,"UTF-8") == "Å")
{
echo "matched";
}
else
{
echo mb_substr($name,0,1,"UTF-8");
}
}
The problem is that programs have different ways of representing different characters. This is referred to as character encoding. Your browser, server, and PHP code are currently confused about which encoding you are using because you are mixing UTF-8
characters with ANSI
code.
You can learn more about encoding here:
http://vlaurie.com/computers2/Articles/characters.htm
There are three things that I do whenever I build a UTF-8 PHP site. These three things should resolve your problem:
Add a PHP UTF-8 Header
Add this to the top of your code:
<?php
header('Content-Type: text/html; charset=utf-8');
...
I believe that this instructs other servers and your browser to parse this document using UTF-8, instead of ANSI. You can read more about this here:
Set HTTP header to UTF-8 using PHP
Add HTML UTF-8 Meta Tags
Add this code to the top of the HTML that you return:
<!doctype html>
<html>
<head>
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
...
This also instructs your browser to read the characters in UTF-8 (instead of ANSI). You can read more about this here:
Set HTTP header to UTF-8 using PHP
Save the PHP File as UTF-8 without BOM
By default, your files usually save in ANSI
encoding. If you want to work with international characters, then you need to save them in
UTF-8encoding. This will let you work with the
Å` character properly.
If you are Notepad++ as your Text Editor, then you can set the encoding of your document under the Encoding menu. Set it to Encode in UTF-8 without BOM
.
Gotcha
UTF-8 without BOM
is not the same thing as UTF-8
. UTF-8 files are often prepended with 3 bytes of data that indicate that the file is a UTF-8 file. This is referred to as the Byte Order Mark
(BOM). You can read more about the BOM here: http://www.arclab.com/products/amlc/utf-8-php-cannot-modify-header-information.html
Most programs can tell that the file is UTF-8 anyway, so the BOM is redundant. If you don't save without the BOM, you'll probably get an error message like this:
Warning: Cannot modify header information – headers already sent
If you see this error message, then you probably have a BOM problem.
The Question mark is because your viewer (browser) is trying to display a character that is not supported in the current character set. Why this is happening on accessing the first character with $name[0] I'm not sure.
Based on the post here:
PHP: Convert specific-Bosnian characters to non-bosnian (utf8 standard chars)
I tried the following:
$result = iconv("UTF-8", "ASCII//TRANSLIT", $test);
$result now contains Aland Islands, the special characters are converted to their normal version.
$result[0] should now contain A.
Please set character encoding for file (stored code) and output