The W3C "HTML5 differences from HTML4" working draft states:
For the HTML syntax, authors are required to declare the character encoding.
What does "required" mean?
Obviously, a browser will still render HTML5 without the charset meta tag. If no encoding is specified, which encoding will a browser use?
Basically, I want to know if it is actually necessary to include <meta charset="">
, or if 99% of the time browsers will use the correct encoding anyway.
Here is the link: http://www.w3.org/TR/html5-diff/#character-encoding
It is not necessary to include <meta charset="blah">
. As the specification says, the character set may also be specified by the server using the HTTP Content-Type
header or by including a Unicode BOM at the beginning of the downloaded file.
Most web servers today will send back a character set in the Content-Type
header for HTML text data if none is specified. If the web server doesn't send back a character set with the Content-Type
header and the file does not include a BOM and the page does not include a <meta charset="blah">
declaration, the browser will have a default encoding that is usually based on the language settings of the host computer. If this does not match the actual character encoding of the file, then some characters will be displayed improperly.
Will browsers use the proper encoding 99% of the time? If your page is UTF-8, probably. If not, probably not.
The W3C provides a document outlining the precendence rules for the three methods that says the order is HTTP header, BOM, followed by in-document specification (meta tag).
According to the Google PageSpeed browser extension, declaring a charset in a meta element "disables IE8's lookahead feature" which apparently forces it to download everything in serial.
My understanding was that <meta charset-"utf-8">
was required for valid HTML5, but that is why I started browsing here.
That draft of the spec seems pretty clear to me and since I add the HTTP header via .htaccess
, I am going to start leaving it out...even though I'm tempted not to, just make IE8 users suffer a bit more.
Thanks.
@Jules Mazur do you have any references about those points? Most of what I do is SEO and accessibility is important to me and if that is the case I am more than receptive to leaving the the meta declaration.
It’s important to specify a character set of the document as earlier as possible (either through the Content-Type
header or the META
tag), otherwise the browser will be left to determine the encoding before parsing the document and this may negatively impact the page load time.
Since 1999 when most of these w3C specs came out, the standards bodies have pushed vendors (makers of servers and browsers and document applications) to follow encoding rules and use metatags to help determine intent. But due to greed, poor browser design, and other factors very few have followed the specs consistently over the years. As a result, we have a fractured system. Some vendors like Mozilla have followed the standards since 2001 for metatags while others like Microsoft and Google have not.
For that reason all web developers should use contingency design in how all their web pages are constructed, and use metatags and other standard markup despite inconsistent support. In other words use both metatag types (<meta charset="UTF-8">
and <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
), though in reality that markup won't decide how your pages are encoded or interpreted by modern user-agents. The main driver for what encoding is used by the browser will be how that page was encoded by software, as someone above mentioned, which increasingly is UTF-8 which is a just a standard of unicode that's currently popular. The user's browser will then likely skip over metatags and check the page to guess the encoding intent of the author.
In 2000 this whole metatag debate was even worse. Use of HTML4 with embedded unicode characters often meant pages where neither encoded correctly or rendered correctly, despite server HTTP headers, use of character entities, and metatags simply because modern browsers back then did not follow the standards and didn't look at metatags, page encoding, or encoded character entities. That is why to battle all the complex combinations of support and systems in failed standards adoptions, its best to use all combinations of technology to increase the 'likelihood' of your web pages being rendered correctly.
We learned back then a valuable lesson: Web Standards would never be consistently followed by companies. When standards are not adopted consistently by private industry it's always best to use all forms and version of tagging, all the time, in every form possible way to maximize your pages are viewed correctly across many different devices using various forms of those standards, even if today they don't matter (as browsers now parse pages and determine encoding themselves).
That should be the strategy used for all web page design until we somehow enforce universal adoption of Web Standards which is increasingly unlikely now with mobile user-agents and HTML5 which have forced us to abandon yet again many of the XML standards that would have enforced better markup design.