I want to store the title in UTF—8,but the pages comes up with many different charset,such as GBK,ISO,unicode……
Could you give me some help?
Thanks.
I want to store the title in UTF—8,but the pages comes up with many different charset,such as GBK,ISO,unicode……
Could you give me some help?
Thanks.
Identify or detect the character encoding and convert the data to UTF-8 if necessary.
For HTML (i.e. text/html) there are three ways to specify the character encoding:
- An HTTP "charset" parameter in a "Content-Type" field.
- A
META
declaration with "http-equiv" set to "Content-Type" and a value set for "charset".- The
charset
attribute set on an element that designates an external resource.
If neither of these is present, you might do some content sniffing or switch to some default character encoding (e.g. ISO 8859-1).
If the identified/detected character encoding is not UTF-8, you then can convert the data to UTF-8 with iconv
or mb_convert_encoding
.