How to determine encoding table of a text file

I have .txt and .java files and I don't know how to determine the encoding table of the files (Unicode, UTF-8, ISO-8525, …). Does there exist any program to determine the file encoding or to see the encoding?

标签： text unicode encoding character-encoding

5条回答

太酷不给撩

2楼-- · 2019-01-13 15:25

In a text file there is no header that saves the encoding or so. You can try the linux/unix command find which tries to guess the encoding:

file -i unreadablefile.txt

or on some systems

file -I unreadablefile.txt

But that often gives you text/plain; charset=iso-8859-1 although the file is unreadable (cryptic glyphs).

This is what I did to find the correct file encoding for an unreadable file and then translate it to utf8 was, after installing iconv. First I tried all encodings, displaying (grep) a line that contained the word www. (a website address):

for ENCODING in $(iconv -l); do echo -n "$ENCODING "; iconv -f $ENCODING -t utf-8 unreadablefile.txt 2>/dev/null| grep 'www'; done | less

This last commandline shows the the tested file encoding and then the translated/transcoded line.

There were some lines which showed readable and consistent (one language at a time) results. I tried manually some of them, for example:

ENCODING=WINDOWS-936; iconv -f $ENCODING -t utf-8 unreadablefile.txt -o test_with_${ENCODING}.txt

In my case it was a chinese windows encoding, which is now readable (if you know chinese).

0人赞添加讨论(0) 举报

走好不送

3楼-- · 2019-01-13 15:27

You can't reliably detect the encoding from a textfile - what you can do is make an educated guess by searching for a non-ascii char and trying to determine if it is a unicode combination that makes sens in the languages you are parsing.

0人赞添加讨论(0) 举报

相关推荐>>

4楼-- · 2019-01-13 15:35

Open the file with Notepad++ and will see on the right down corner the encoding table name. And in the menu encoding you can change the encoding table and save the file.

0人赞添加讨论(0) 举报

孤傲高冷的网名

5楼-- · 2019-01-13 15:37

See this question and the selected answer. There’s no sure-fire way of doing it. At most, you can rule things out. The UTF encodings you’re unlikely to get false positives on, but the 8-bit encodings are tough, especially if you don’t know the starting language. No tool out there currently handles all the common 8-bit encodings from Macs, Windows, Unix, but the selected answer provides an algorithmic approach that should work adequately for a certain subset of encodings.

0人赞添加讨论(0) 举报

姐就是有狂的资本

6楼-- · 2019-01-13 15:40

If you're on Linux, try file -i filename.txt.

$ file -i vol34.tex 
vol34.tex: text/x-tex; charset=us-ascii

For reference, here is my environment:

$ which file
/usr/bin/file
$ file --version
file-5.09
magic file from /etc/magic:/usr/share/misc/magic

Some file versions (e.g. file-5.04 on OS X/macOS) have slightly different command-line switches:

$ file -I vol34.tex 
vol34.tex: text/x-tex; charset=us-ascii
$ file --mime vol34.tex
vol34.tex: text/x-tex; charset=us-ascii

Also, have a look here.

0人赞添加讨论(0) 举报

How to determine encoding table of a text file

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间