UTF8 to equivalent number in php

I've been searching my !!! off trying to find a PHP function to convert UTF8 to the equivalent number. I'm not entirely sure what to call the number (I heard its called an ordinate?) but heres an example: http://jrgraphix.net/r/Unicode/3040-309F

Basically I'm trying to read a UTF-8 .txt file in PHP and then save every line in an array, so I can mess around with it.

If anyone can assist me with this it would be highly appreciated, as I am not that familiar with UTF8 yet.

Edit: This is what I've got so far:

echo "var TextCharacters = new Array();\n";

$LineArray = array();
$file_handle = fopen("lesson1.txt", "r");


while (!feof($file_handle)) 
{
  $line_of_text = fgets($file_handle);  
  array_push($LineArray, $line_of_text);
}

fclose($file_handle);

foreach($LineArray as $s)
{
    for($i = 0; $i < mb_strlen($s,"utf-8"); $i++)
    {
        $char = mb_substr($s, $i, 1, "utf-8");
        echo "alert(go(" . bin2hex(iconv('UTF-8', 'UCS-2', $char)) . "));";         
    }
}

标签： php encoding utf-8

2条回答

虎瘦雄心在

2楼-- · 2019-07-28 11:09

There is nothing magic about UTF-8 in PHP. When you read the file, you'll get the byte values (and not parsed as characters). Iterate of the data you've read and use ord() to get the decimal value of the byte.

If you want to do this with UTF-8 code points, you can use either mb_substr or iconv_substr to extract each character before using ord() to print the value of each byte that makes up the character.

Update: To expand with a complete solution:

utf8.test: fooÆØÅござ

$utf8 = file_get_contents("utf8.test");

for ($i = 0; $i < mb_strlen($utf8, "utf-8"); $i++)
{
    $char = mb_substr($utf8, $i, 1, "utf-8");

    print($char);
    print("\n");

    for ($j = 0; $j < strlen($char); $j++)
    {
        print(dechex(ord($char[$j])));
    }

    print("\n\n");
}

Output:

f
66

o
6f

o
6f

Æ
c386

Ø
c398

Å
c385

ご
e38194

ざ
e38196

Hope that helps.

0人赞添加讨论(0) 举报

成全新的幸福

3楼-- · 2019-07-28 11:19

What you're looking for is the Unicode code point, i.e. the numeric identifier by which the character is known in the Unicode character table. The "cheapest" way to do this is through the UCS-2 character encoding, which maps 1:1 from bytes unto the Unicode code points:

echo bin2hex(iconv('UTF-8', 'UCS-2', 'あ'));
// 3042

Caveats: the returned code is always 4 hexadecimal digits long (which you may or may not like) and UCS-2 does not support characters higher than the BMP, i.e. higher than code point FFFF.

0人赞添加讨论(0) 举报

UTF8 to equivalent number in php

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间