Single quotes showing as diamond shaped question m

2019-04-07 11:58发布

问题:

I am working with a web page in which I switched the character set from iso-8859-1 to utf-8. The top of the page reads like this:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>[title of site]</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

I am only using ASCII characters in the page, and since utf-8 encoding supersets ASCII, this should be fine. However, single quotes in the text are showing up as question marks surrounded by black diamonds. I have verified these are are ASCII single quotes (not straight quotes).

I've read much online that describes solutions to the problem that involve PHP, magic quotes, database configuration, etc. However, this is a flat HTML page that isn't being rendered by any programs.

Also, many who have this problem are told to switch to UTF-8 to fix the problem. This is exactly how I introduced the problem.

Please look at http://mch.blackcatwebinc.com/src/events.html to see this problem.

回答1:

The only quotes in ASCII are the single quote ' (0x27 or 39) and the double quote " (0x22 or 33). What you have there is an 8-bit encoding that places quotes at 145 (0x91) and 146 (0x92) called CP1252; it's the standard 8-bit Western European encoding for Windows. If what you want is UTF-8, you need to convert that to UTF-8, since it's not valid UTF-8; valid UTF-8 uses multiple bytes for characters above 127 (0x7F), and places the opening and closing quotes at U+2018 and U+2019 respectively.



回答2:

According to the W3C, the meta charset

should appear as close as possible to the top of the head element

From http://www.w3.org/International/questions/qa-html-encoding-declarations#metacontenttype

So, I might try to place the meta tag above the title.

Also, as mentioned in the first answer by @user1505373, UTF is always capitalized and there is no space after the = in any of the examples I saw.



回答3:

Your source code is not saved in UTF-8 but Latin1 CP1252, and those quotes are not simple quotes but U+2019 RIGHT SINGLE QUOTATION MARKS (encoded in Latin1). Save the source file in UTF-8 and it'll work.



回答4:

The simplest fix is to change UTF-8 to windows-1252 in the meta tag. This works, because the server announces no encoding in the Content-Type header, so browsers and other clients will use the one specified in a meta tag.

The name windows-1252 is the preferred MIME name for the 8-bit Windows Latin-1 encoding, also known as cp1252 and some other names (often misrepresented as “ANSI”).

As @deceze explains, the actual encoding of the data is windows-1252, not UTF-8. You can alternatively change the actual encoding to UTF-8 by saving the file with a suitable command in your authoring software. But what really matters is that the declared encoding matches the real one.

Yet another possibility is to use “escapes” for the apostrophe, such as &rsquo;. They work independently of encoding, but they make the source code less legible.



回答5:

The only difference I see between your tag and the one on the site I'm working on is the space after the semicolon and that utf is lowercase on yours. Try capitalizing UTF.



回答6:

All ASCII printable characters have their equivalent HTML Entity Code. Some of these characters are generally supported by most common OS typefaces, some are categorized as Symbols that bring us to your rendering issue.

What you supposedly have there is a closing single quote, and in order to get it rightly printed you should use it's entity code, or &#146; respectively. If it turns to be an opening single quote, then you should use &#145; instead.

Note, there's no HTML Entity Name for the two ASCII characters (and some more) so you're required to opt the entity code variant.