I need to convert the text file to dos format (ending each line with 0x0d0x0a
, rather than 0x0a
only), if the file is in unix format (0x0a
only at the end of each line).
I know how to convert it (sed 's/$/^M/'
), but don't how how to detect the end-of-line character(s) of a file.
I am using ksh.
Any help would be appreciated.
[Update]:
Kind of figured it out, and here is my ksh script to do the check.
[qiangxu@host:/my/folder]# cat eol_check.ksh
#!/usr/bin/ksh
if ! head -1 $1 |grep ^M$ >/dev/null 2>&1; then
echo UNIX
else
echo DOS
fi
In the above script, ^M
should be inserted in vi
with Ctrl-V
and Ctrl-M
.
Want to know if there is any better method.
if awk '/\r$/{exit 0;} 1{exit 1;}' myFile
then
echo "is DOS"
fi
Simply use the file
command.
If the file contains lines with CR LF
at the end, this is printed out by a comment:
'ASCII text, with CRLF line terminators'.
e.g.
if file myFile | grep "CRLF" > /dev/null 2>&1;
then
....
fi
The latest (7.1) version of the dos2unix (and unix2dos) command that installs with Cygwin and some recent Linux distributions has a handy --info option which prints out a count of the different types of newline in each file. This is dos2unix 7.1 (2014-10-06) http://waterlan.home.xs4all.nl/dos2unix.html
From the man page:
--info[=FLAGS] FILE ...
Display file information. No conversion is done.
The following information is printed, in this order:
number of DOS line breaks, number of Unix line breaks, number of Mac line breaks, byte order mark, text or binary, file name.
Example output:
6 0 0 no_bom text dos.txt
0 6 0 no_bom text unix.txt
0 0 6 no_bom text mac.txt
6 6 6 no_bom text mixed.txt
50 0 0 UTF-16LE text utf16le.txt
0 50 0 no_bom text utf8unix.txt
50 0 0 UTF-8 text utf8dos.txt
2 418 219 no_bom binary dos2unix.exe
Optionally extra flags can be set to change the output. One or more flags can be added.
d Print number of DOS line breaks.
u Print number of Unix line breaks.
m Print number of Mac line breaks.
b Print the byte order mark.
t Print if file is text or binary.
c Print only the files that would be converted.
With the "c" flag dos2unix will print only the files that contain DOS line breaks, unix2dos will print only file names that have Unix line breaks.
Thus:
if [[ -n $(dos2unix --info=c "${filename}") ]] ; then echo DOS; fi
Conversely:
if [[ -n $(unix2dos --info=c "${filename}") ]] ; then echo UNIX; fi
I can't test on AIX, but try:
if [[ "$(head -1 filename)" == *$'\r' ]]; then echo DOS; fi
You can simply remove any existing carriage returns from all lines, and then add the carriage return to the end of all lines. Then it doesn't matter what format the incoming file is in. The outgoing format will always be DOS format.
sed 's/\r$//;s/$/\r/'
I'm probably late on this one, but I've had the same issue and I did not want to put the special ^M
character in my script (I'm worried some editors might not display the special character properly or some later programmer might replace it by 2 normal characters: ^ and M...).
The solution I found feeds the special character to grep, by letting the shell convert its hex value:
if head -1 ${filename} | grep $'[\x0D]' >/dev/null
then
echo "Win"
else
echo "Unix"
fi
unfortunately I cannot make the $'[\x0D]'
construct work in ksh.
In ksh, I found this:
if head -1 ${filename} | od -x | grep '0d0a$' >/dev/null
then
echo "Win"
else
echo "Unix"
fi
od -x
displays the text in hex codes.
'0d0a$'
is the hex code for CR-LF (the DOS-Win line terminator). The Unix line terminator is '0a00$'