how do i get rid of unrecognized characters in utf

2019-07-11 05:41发布

问题:

I have a mysql database that's set to utf-8. I have set my php header to: header("Content-Type: text/html; charset=utf-8"); and in my html: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

When I return anything that has round quotes or apostrophes, they show up as unrecognized characters (black diamond with a ? inside).

If I run utf8_encode () on the string I'm echoing out, it looks fine in Chrome, but shows a different weird character in Firefox. Is there something else I can do site-wide to make this work better?

(I've accessed the db with sequel pro and phpmyadmin)

回答1:

full utf-8 settings:

1) .htaccess

AddDefaultCharset utf-8
PHP_VALUE default_charset utf-8

2) after mysqli_connect() in php call this:

mysqli_query($this->link, 'SET character_set_client="utf8",character_set_connection="utf8",character_set_results="utf8"; ');

3) your DB should be created with "collation: utf8" charset; all fields in table also should be "collation: utf8"

4) your PHP files also should be created with utf8 charset



回答2:

Make sure the communication method is in UTF-8. Otherwise, it will be converted.

See mysql_client_encoding and mysql_set_charset



回答3:

have you tried using htmlentities? i know that this doesn't affect the character encoding, but it might get rid of the black square with the question mark. it often does for me...

$output = htmlentities($db_output);
echo $output;


回答4:

How exactly are you getting these "round quotes and apostrophes"? If their ultimate source is a Word or Outlook document, they will be encoded in Windows-1252. If you copy and paste directly from a Word document into a UTF-8 Web page, the UTF-8 version of the clipboard should be used, and these characters come over as multibyte UTF-8 characters. If these characters went through other files or non-UTF-8 Web pages first, it's possible that they remained in Word "Smart Quote" single-byte encoding, which is invalid in UTF-8 (and thus the ?-in-black-diamond glyph). Note that Web pages claiming to be Latin-1 (ISO-8859-1) are frequently rendered as Windows-1252, as 1) the control codes x80-x9F that Smart Quotes overlay are very rarely used, and 2) it's so common for Smart Quotes to be mixed in with text.

For a UTF-8 page that gives quotes and apostrophes as "invalid characters", tell the browser to use Windows-1252 encoding instead for the page (View > Character Encoding or something similar). If these characters show up correctly now, untranslated Smart Quotes were the problem. Unfortunately, once they're in the database, only manual editing will fix them.