In Windows PowerShell:
echo "string" > file.txt
In Cygwin:
$ cat file.txt
:::s t r i n g
$ dos2unix file.txt
dos2unix: Skipping binary file file.txt
I want a simple "string" in the file. How do I do it? I.e., when I say cat file.txt
I need only "string" as output. I am echoing from Windows PowerShell and that cannot be changed.
Try echo "string" | out-file -encoding ASCII file.txt
to get a simple ASCII-encoded txt file.
Comparison of the files produced:
echo "string" | out-file -encoding ASCII file.txt
will produce a file with the following contents:
73 74 72 69 6E 67 0D 0A (string..)
however
echo "string" > file.txt
will produce a file with the following contents:
FF FE 73 00 74 00 72 00 69 00 6E 00 67 00 0D 00 0A 00 (ÿþs.t.r.i.n.g.....)
(Byte order mark FF FE indicates the file is UTF-16 (LE). The signature for UTF-16 (LE) = 2 bytes: 0xFF 0xFE followed by 2 byte pairs. xx 00 xx 00 xx 00 for normal 0-127 ASCII chars
These two commands are equivalent in that they both use UTF-16 encoding by default:
echo "string" > file.txt
echo "string" | out-file file.txt
You can add an explicit encoding parameter to the latter form (as indicated by jon Z) to produce plain ASCII:
echo "string" | out-file -encoding ASCII file.txt
Alternately, you could use set-content
, which uses ASCII encoding by default:
echo "string" | set-content file.txt
Corollary 1:
Want to convert a unicode file to ASCII in one line?
Just use this:
get-content your_unicode_file | set-content your_ascii_file
which can be abbreviated to:
gc your_unicode_file | sc your_ascii_file
Corollary 2:
Want to get a hex dump so you can really see what is unicode and what is ASCII?
Use the clean and simple Get-HexDump function available on PowerShell.com.
With that in place you can examine your generated files with just:
Get-HexDump file.txt
For anything non-trivial, you can specify how many columns wide you want the output and how many bytes of the file to process with something like this:
Get-HexDump file.txt -width 15 -bytes 150
PowerShell creates Unicode UTF-16 files with a Byte Order Mark (BOM).
Dos2unix 6.0 and higher can read UTF-16 files and convert them to UTF-8 (the default Cygwin encoding) and remove the BOM. Versions prior to 6.0 will see UTF-16 files as binary and skip them, as in your example.