Notepad++ can recognize encoding?

2019-01-15 07:50发布

问题:

I created file with UTF-8 encoded content (using PHP fputcsv).

When I open this file in Notepad++ - characters are wrong (Notepad++ starts with ANSI encoding).

When I set Format->"Encode in UTF-8" from menu - everything is fine.

Im worrying, that Notepad++ can recognize encoding somehow, and maybe something is wrong with my file created with fputcsv? First byte or something?

回答1:

Automatically detecting an encoding is not something that can be done accurately. It's pretty much essential that the encoding be specified explicitly. It can be guessed in some cases, but even then not with 100% certainty.

This documentation (Encoding) explains the situation in relation to Notepad++. They also point out that the difficulty arises especially if the file has not been saved with a Byte Order Mark (BOM).

Given that your file displays correctly once you manually set the encoding, I would say there's nothing wrong with how you are generating and saving the file. The only thing you can check for is whether a BOM is being saved, which might improve the chances of Notepad++ being able to automatically detect the encoding.

It's worth noting that although it may help editors like Notepad++ identify the encoding more accurately, according to The Unicode Standard document, the BOM is not recommended.



回答2:

You have to check the lower right corner of the Notepad++ GUI to see the actual enconding that is being used. The problem it's not that Notepad++ specific because guessing the right encoding is a big problem without any real solution so it's better to let the user decide what is the most appropriate encoding in each single case.



回答3:

When you want to reflect the encoding of the text file in a Java program, you have to consider two thnigs: encoding and character set. When you open a text file, you see encoding under "Encoding" menu. Additionally look at the character set menu point. Under "Eastern European" you will find "ISO 8859-2", and under Central European "Windows-1250". You can set corresponding encoding in the Java program when you look up in the table: https://docs.oracle.com/javase/8/docs/technotes/guides/intl/encoding.doc.html For example, for Cenntral European character set "Windows-1250" the table suggest Java encoding "Cp1250". Set the encoding and you will see the characters in program properly.