Why is ASCII representation of this character retu

2019-02-28 05:40发布

问题:

So I am playing with this tool:

http://www.unit-conversion.info/texttools/ascii/

When I try this character:

'

I see the value 039 which can be verified from: http://www.asciitable.com

But I am curios about:

This character in the same tool will return: 226 128 153

But as far as I know ASCII is 8 bits (or even 7...)

What is 226 128 153 in here?

回答1:

it seems that that is the UTF16 representation. probably that website is converting the characters to their code representation with "’".charCodeAt(0); in Javascript



回答2:

The character you have is U+2019 RIGHT SINGLE QUOTATION MARK, which is also the typographically correct way of representing the apostrophe in most positions.

What the site does, is representing the characters in UTF-8. As you can see in the page I linked, this character is encoded as three bytes, 0xE2 0x80 0x99 in hexadecimal, or 226 128 153 in decimal.

The reason that that page uses UTF-8 instead of ASCII? Simple. First, ASCII is a subset of UTF-8. Second, UTF-8 supports the entire Unicode. So there's rarely a reason to use ASCII if UTF-8 can be used instead.



回答3:

The first character is ASCII, code 39. The second is UNICODE character, code 8217.

See UNICODE character table, specifically for this character.

For more information read the UNICODE article.

$(document).ready(function(){
  $('#res').html("’".charCodeAt(0));
})
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div id='res'><div>



回答4:

I have this same issue (trying to actually convert a string to uppercase, ran into this character and it 'broke' a bunch of methods of converting a string with special characters to uppercase.

I used this solution:

    $text = preg_replace("/[`‛′’‘]/u", "'", $text);

(NOT MINE - taken from here: https://stackoverflow.com/a/24925209/6136613)

This then converts it to a regular comma, and you can perform normal php functions on it.