This isn\'t really a programming question, is there a command line or Windows tool (Windows 7) to get the current encoding of a text file? Sure I can write a little C# app but I wanted to know if there is something already built in?
问题:
回答1:
Open up your file using regular old vanilla Notepad that comes with Windows.
It will show you the encoding of the file when you click \"Save As...\".
It\'ll look like this:
Whatever the default-selected encoding is, that is what your current encoding is for the file.
If it is UTF-8, you can change it to ANSI and click save to change the encoding (or visa-versa).
I realize there are many different types of encoding, but this was all I needed when I was informed our export files were in UTF-8 and they required ANSI. It was a onetime export, so Notepad fit the bill for me.
FYI: From my understanding I think \"Unicode\" (as listed in Notepad) is a misnomer for UTF-16.
More here on Notepad\'s \"Unicode\" option: Windows 7 - UTF-8 and Unicdoe
回答2:
The (Linux) command-line tool \'file\' is available on Windows via GnuWin32:
http://gnuwin32.sourceforge.net/packages/file.htm
If you have git installed, it\'s located in C:\\Program Files\\git\\usr\\bin.
Example:
C:\\Users\\SH\\Downloads\\SquareRoot>file * _UpgradeReport_Files; directory Debug; directory duration.h; ASCII C++ program text, with CRLF line terminators ipch; directory main.cpp; ASCII C program text, with CRLF line terminators Precision.txt; ASCII text, with CRLF line terminators Release; directory Speed.txt; ASCII text, with CRLF line terminators SquareRoot.sdf; data SquareRoot.sln; UTF-8 Unicode (with BOM) text, with CRLF line terminators SquareRoot.sln.docstates.suo; PCX ver. 2.5 image data SquareRoot.suo; CDF V2 Document, corrupt: Cannot read summary info SquareRoot.vcproj; XML document text SquareRoot.vcxproj; XML document text SquareRoot.vcxproj.filters; XML document text SquareRoot.vcxproj.user; XML document text squarerootmethods.h; ASCII C program text, with CRLF line terminators UpgradeLog.XML; XML document text C:\\Users\\SH\\Downloads\\SquareRoot>file --mime-encoding * _UpgradeReport_Files; binary Debug; binary duration.h; us-ascii ipch; binary main.cpp; us-ascii Precision.txt; us-ascii Release; binary Speed.txt; us-ascii SquareRoot.sdf; binary SquareRoot.sln; utf-8 SquareRoot.sln.docstates.suo; binary SquareRoot.suo; CDF V2 Document, corrupt: Cannot read summary infobinary SquareRoot.vcproj; us-ascii SquareRoot.vcxproj; utf-8 SquareRoot.vcxproj.filters; utf-8 SquareRoot.vcxproj.user; utf-8 squarerootmethods.h; us-ascii UpgradeLog.XML; us-ascii
回答3:
If you have \"git\" or \"Cygwin\" on your Windows Machine, then go to the folder where your file is present and execute the command:
file *
This will give you the encoding details of all the files in that folder.
回答4:
Another tool that I found useful: https://archive.codeplex.com/?p=encodingchecker
回答5:
Here\'s my take how to detect the Unicode family of text encodings via BOM. The accuracy of this method is low, as this method only works on text files (specifically Unicode files), and defaults to ascii
when no BOM is present (like most text editors, the default would be UTF8
if you want to match the HTTP/web ecosystem).
Update 2018: I no longer recommend this method. I recommend using file.exe from GIT or *nix tools as recommended by @Sybren, and I show how to do that via PowerShell in a later answer.
# from https://gist.github.com/zommarin/1480974
function Get-FileEncoding($Path) {
$bytes = [byte[]](Get-Content $Path -Encoding byte -ReadCount 4 -TotalCount 4)
if(!$bytes) { return \'utf8\' }
switch -regex (\'{0:x2}{1:x2}{2:x2}{3:x2}\' -f $bytes[0],$bytes[1],$bytes[2],$bytes[3]) {
\'^efbbbf\' { return \'utf8\' }
\'^2b2f76\' { return \'utf7\' }
\'^fffe\' { return \'unicode\' }
\'^feff\' { return \'bigendianunicode\' }
\'^0000feff\' { return \'utf32\' }
default { return \'ascii\' }
}
}
dir ~\\Documents\\WindowsPowershell -File |
select Name,@{Name=\'Encoding\';Expression={Get-FileEncoding $_.FullName}} |
ft -AutoSize
Recommendation: This can work reasonably well if the dir
, ls
, or Get-ChildItem
only checks known text files, and when you\'re only looking for \"bad encodings\" from a known list of tools. (i.e. SQL Management Studio defaults to UTF16, which broke GIT auto-cr-lf for Windows, which was the default for many years.)
回答6:
You can use a free utility called Encoding Recognizer (requires java). You can find it at http://mindprod.com/products2.html#ENCODINGRECOGNISER
回答7:
Similar to the solution listed above with Notepad, you can also open the file in Visual Studio, if you\'re using that. In Visual Studio, you can select \"File > Advanced Save Options...\"
The \"Encoding:\" combo box will tell you specifically which encoding is currently being used for the file. It has a lot more text encodings listed in there than Notepad does, so it\'s useful when dealing with various files from around the world and whatever else.
Just like Notepad, you can also change the encoding from the list of options there, and then saving the file after hitting \"OK\". You can also select the encoding you want through the \"Save with Encoding...\" option in the Save As dialog (by clicking the arrow next to the Save button).
回答8:
I wrote the #4 answer (at time of writing). But lately I have git installed on all my computers, so now I use @Sybren\'s solution. Here is a new answer that makes that solution handy from powershell (without putting all of git/usr/bin in the PATH, which is too much clutter for me).
Add this to your profile.ps1
:
$global:gitbin = \'C:\\Program Files\\Git\\usr\\bin\'
Set-Alias file.exe $gitbin\\file.exe
And used like: file.exe --mime-encoding *
. You must include .exe in the command for PS alias to work.
But if you don\'t customize your PowerShell profile.ps1 I suggest you start with mine: https://gist.github.com/yzorg/8215221/8e38fd722a3dfc526bbe4668d1f3b08eb7c08be0
and save it to ~\\Documents\\WindowsPowerShell
. It\'s safe to use on a computer without git, but will write warnings when git is not found.
The .exe in the command is also how I use C:\\WINDOWS\\system32\\where.exe
from powershell; and many other OS CLI commands that are \"hidden by default\" by powershell, *shrug*.
回答9:
The only way that I have found to do this is VIM or Notepad++.
回答10:
Some C code here for reliable ascii, bom\'s, and utf8 detection: https://unicodebook.readthedocs.io/guess_encoding.html
Only ASCII, UTF-8 and encodings using a BOM (UTF-7 with BOM, UTF-8 with BOM, UTF-16, and UTF-32) have reliable algorithms to get the encoding of a document. For all other encodings, you have to trust heuristics based on statistics.