How to check encoding of a CSV file

2020-01-31 04:02发布

问题:

I have a CSV file and I wish to understand its encoding. Is there a menu option in Microsoft Excel that can help me detect it

OR do I need to make use of programming languages like C# or PHP to deduce it.

回答1:

You can just open the file using notepad and then goto File -> Save As. Next to the Save button there will be an encoding drop down and the file's current encoding will be selected there.



回答2:

In Linux systems, you can use file command. It will give the correct encoding

Sample:

file blah.csv

Output:

blah.csv: ISO-8859 text, with very long lines


回答3:

If you use Python, just use a print() function to check the encoding of a csv file. For example:

with open('file_name.csv') as f:
    print(f)

The output is something like this:

<_io.TextIOWrapper name='file_name.csv' mode='r' encoding='utf8'>


回答4:

Use chardet https://github.com/chardet/chardet (documentation is short and easy to read).

Install python, then pip install chardet, at last use the command line command.

I tested under GB2312 and it's pretty accurate. (Make sure you have at least a few characters, sample with only 1 character may fail easily).

file is not reliable as you can see.



标签: csv encoding