PHP Unicode character questions

2019-08-26 20:08发布

问题:

Here's a link I found, which even has a character I need to play with for other projects of mine.

http://www.fileformat.info/info/unicode/char/2446/index.htm

There is a box with the Title of: "Encodings" on that page. And I am wondering about some of the rows.

I obviously need a course on this sort of thing, but I'm wondering what the difference is between "HTML Entity (decimal)" and "HTML Entity (hex)".

The funny thing is, which confuses me, I throw those characters on a web page, and they display fine. But I haven't specified any UTF-8 encoding in the php page.

<?php
$string1 = '&#x2446;';
$string2 = '&#9286;';

echo $string1;
echo '<br>';
echo $string2;
?>

Does the browser know how to display both automatically? And to make it weirder, I can only see those characters on my Mac, in Firefox. But my windows box doesn't want to show them. I've tested it in chrome, and firefox. Do I need to tell the browsers to view them correctly? Or is it an operating system modification?

回答1:

You can use any "HTML Entity" in any encoding and in practice, if You have installed appropriate fonts, every browser will work fine. Well, it was created for displaying characters that are not included in current encoding. In Your situations it looks You have to install some fonts on Your Windows box.

On the other hand, it has almost nothing to do with PHP.



回答2:

They're both valid numeric HTML entities, and the browser does indeed know how to decode them. The difference is the first is a hexadecimal number, while the latter is decimal.

0x2446 = 9286

Note that 0x means hexadecimal.

Also note that it is good practice to always have your server explicitly specify an encoding. The W3C explains how to do so. UTF-8 is a good choice.

If you use any Unicode encoding, you can always put the character right on your page, so you don't have to use entities.



回答3:

To be exact, neither is an entity reference. &amp; is an entity reference that refers to the entity named amp that is defined as:

<!ENTITY amp     CDATA "&#38;"   -- ampersand, U+0026 ISOnum -->

Here you can see that the entity’s value is just another reference: &#38;.

&#x2446; and &#9286; are “just” character references (numeric character references to be exact) and refers to characters by specifying the code position of a character in the Universal Character Set, i.e. the Unicode character set.



标签: php utf-8