I am learning HTML now, and one aspect related to the encoding confuses me.
Imagine, I open my test editor, write some HTML code and save it using charset A (e.g. UTF-8 or ANSI or something else). Then in my view the following happens:
all characters I have typed are mapped to certain numbers, the mapping is specified by the charset A;
the numbers are saved in the computer memory in their binary representation as sequences of 0 s and 1 s.
In the HTML document I have just saved there is a line <meta charset="B">
where B is another charset different from A.
What now happens when I attempt to open the HTML document with a browser? Will it map according to charset B the sequences of 0 s and 1 s which comprise my document to wrong characters (not the ones I meant when I wrote the document) and thus will display some rubbish?
As you see with this question I am trying to understand the real meaning of <meta charset="B">
in an HTML document.
Yes, exactly, you have understood correctly. This is precisely how mojibake is being born; something is trying to interpret a binary sequence using the wrong character set, which either leads to unintended/wrong characters being displayed, or the document failing to decode entire, at which point the concrete behaviour depends on the application trying to do the decoding.
The
<meta charset>
element (and really primarily theContent-Type
HTTP header) are advisory, informing the client (browser) in what character set it should interpret the document. Otherwise the client cannot know. If the server/document are advising the wrong charset, the result will be broken to some degree or another.