I have .txt
and .java
files and I don't know how to determine the encoding table of the files (Unicode, UTF-8, ISO-8525, …). Does there exist any program to determine the file encoding or to see the encoding?
相关问题
- React Native Inline style for multiple Text in sin
- UrlEncodeUnicode and browser navigation errors
- ruby 1.9 wrong file encoding on windows
- WebElement.getText() function and utf8
- How to convert a string to a byte array which is c
相关文章
- 放在input的text下文本一直出现一个/(即使还没输入任何值)是什么情况
- iconv() Vs. utf8_encode()
- Why is `'↊'.isnumeric()` false?
- How to display unicode in SVG?
- When sending XML to JMS should I use TextMessage o
- Google app engine datastore string encoding proble
- UnicodeEncodeError when saving ImageField containi
- How can i get know that my String contains diacrit
In a text file there is no header that saves the encoding or so. You can try the linux/unix command
find
which tries to guess the encoding:or on some systems
But that often gives you
text/plain; charset=iso-8859-1
although the file is unreadable (cryptic glyphs).This is what I did to find the correct file encoding for an unreadable file and then translate it to utf8 was, after installing
iconv
. First I tried all encodings, displaying (grep
) a line that contained the word www. (a website address):This last commandline shows the the tested file encoding and then the translated/transcoded line.
There were some lines which showed readable and consistent (one language at a time) results. I tried manually some of them, for example:
In my case it was a chinese windows encoding, which is now readable (if you know chinese).
You can't reliably detect the encoding from a textfile - what you can do is make an educated guess by searching for a non-ascii char and trying to determine if it is a unicode combination that makes sens in the languages you are parsing.
Open the file with Notepad++ and will see on the right down corner the encoding table name. And in the menu encoding you can change the encoding table and save the file.
See this question and the selected answer. There’s no sure-fire way of doing it. At most, you can rule things out. The UTF encodings you’re unlikely to get false positives on, but the 8-bit encodings are tough, especially if you don’t know the starting language. No tool out there currently handles all the common 8-bit encodings from Macs, Windows, Unix, but the selected answer provides an algorithmic approach that should work adequately for a certain subset of encodings.
If you're on Linux, try
file -i filename.txt
.For reference, here is my environment:
Some
file
versions (e.g. file-5.04 on OS X/macOS) have slightly different command-line switches:Also, have a look here.