What is UTF-8 encoding and why text files saved in this Format are more bigger than the other?
For example I had typed 'A' in the notepad and save it in UTF-8 format.
After that, The file size turns to : 4 bytes . why?
What is UTF-8 encoding and why text files saved in this Format are more bigger than the other?
For example I had typed 'A' in the notepad and save it in UTF-8 format.
After that, The file size turns to : 4 bytes . why?
that's only because of the BOM, byte order mark. UTF-8 only expands characters that have a numeric value greater than 127 (non-ASCII).
not all text editors do this. Notepad is notorious for it (the useless UTF-8 BOM).
It's almost certainly because whatever you're using to save the file is also including the byte order mark which in UTF-8 is 0xEF 0xBB 0xBF.
As for what UTF-8 is - it's a Unicode encoding which uses progressively more bytes for higher Unicode values; importantly, ASCII characters are stored as single bytes (the same bytes as they would be in ASCII). So any ASCII file is also a UTF-8 file with the same text. This web page has more, as does Wikipedia.
Because a BOM (byte order mark) was inserted at the start of the file.
The BOM is a special character U+FEFF meant not to have any meaning except as a way to detect the encoding of a file. You can read about it here: http://unicode.org/faq/utf_bom.html#BOM
In the case of UTF-8, the BOM is encoded as \xEF \xBB \xBF which is where the 3 extra bytes come from. Notepad and other text editors look for the BOM to guess the encoding of the file. If it sees \xFF \xFE it will assume it is UCS-2 encoded in little endian format. A \xFE \xFF means UCS-2 encoded in big endian format.