I've been searching my !!! off trying to find a PHP function to convert UTF8 to the equivalent number. I'm not entirely sure what to call the number (I heard its called an ordinate?) but heres an example: http://jrgraphix.net/r/Unicode/3040-309F
Basically I'm trying to read a UTF-8 .txt file in PHP and then save every line in an array, so I can mess around with it.
If anyone can assist me with this it would be highly appreciated, as I am not that familiar with UTF8 yet.
Edit: This is what I've got so far:
echo "var TextCharacters = new Array();\n";
$LineArray = array();
$file_handle = fopen("lesson1.txt", "r");
while (!feof($file_handle))
{
$line_of_text = fgets($file_handle);
array_push($LineArray, $line_of_text);
}
fclose($file_handle);
foreach($LineArray as $s)
{
for($i = 0; $i < mb_strlen($s,"utf-8"); $i++)
{
$char = mb_substr($s, $i, 1, "utf-8");
echo "alert(go(" . bin2hex(iconv('UTF-8', 'UCS-2', $char)) . "));";
}
}
There is nothing magic about UTF-8 in PHP. When you read the file, you'll get the byte values (and not parsed as characters). Iterate of the data you've read and use ord() to get the decimal value of the byte.
If you want to do this with UTF-8 code points, you can use either mb_substr or iconv_substr to extract each character before using ord() to print the value of each byte that makes up the character.
Update: To expand with a complete solution:
utf8.test:
fooÆØÅござ
Output:
Hope that helps.
What you're looking for is the Unicode code point, i.e. the numeric identifier by which the character is known in the Unicode character table. The "cheapest" way to do this is through the UCS-2 character encoding, which maps 1:1 from bytes unto the Unicode code points:
Caveats: the returned code is always 4 hexadecimal digits long (which you may or may not like) and UCS-2 does not support characters higher than the BMP, i.e. higher than code point FFFF.