how to check end-of-line of a text file to see if

2020-07-16 02:47发布

问题:

I need to convert the text file to dos format (ending each line with 0x0d0x0a, rather than 0x0a only), if the file is in unix format (0x0a only at the end of each line).

I know how to convert it (sed 's/$/^M/'), but don't how how to detect the end-of-line character(s) of a file.

I am using ksh.

Any help would be appreciated.

[Update]: Kind of figured it out, and here is my ksh script to do the check.

[qiangxu@host:/my/folder]# cat eol_check.ksh
#!/usr/bin/ksh

if ! head -1 $1 |grep ^M$ >/dev/null 2>&1; then
  echo UNIX
else
  echo DOS
fi

In the above script, ^M should be inserted in vi with Ctrl-V and Ctrl-M.

Want to know if there is any better method.

回答1:

if awk  '/\r$/{exit 0;} 1{exit 1;}' myFile
then
  echo "is DOS"
fi


回答2:

Simply use the file command. If the file contains lines with CR LF at the end, this is printed out by a comment: 'ASCII text, with CRLF line terminators'.

e.g.

if file  myFile | grep "CRLF"  > /dev/null 2>&1;
  then
  ....
fi


回答3:

The latest (7.1) version of the dos2unix (and unix2dos) command that installs with Cygwin and some recent Linux distributions has a handy --info option which prints out a count of the different types of newline in each file. This is dos2unix 7.1 (2014-10-06) http://waterlan.home.xs4all.nl/dos2unix.html

From the man page:

--info[=FLAGS] FILE ...
       Display file information. No conversion is done.

The following information is printed, in this order: 
number of DOS line breaks, number of Unix line breaks, number of Mac line breaks, byte order mark, text or binary, file name.

       Example output:
            6       0       0  no_bom    text    dos.txt
            0       6       0  no_bom    text    unix.txt
            0       0       6  no_bom    text    mac.txt
            6       6       6  no_bom    text    mixed.txt
           50       0       0  UTF-16LE  text    utf16le.txt
            0      50       0  no_bom    text    utf8unix.txt
           50       0       0  UTF-8     text    utf8dos.txt
            2     418     219  no_bom    binary  dos2unix.exe

Optionally extra flags can be set to change the output. One or more flags can be added.
       d   Print number of DOS line breaks.
       u   Print number of Unix line breaks.
       m   Print number of Mac line breaks.
       b   Print the byte order mark.
       t   Print if file is text or binary.
       c   Print only the files that would be converted.

With the "c" flag dos2unix will print only the files that contain DOS line breaks, unix2dos will print only file names that have Unix line breaks.

Thus:

if [[ -n $(dos2unix --info=c "${filename}") ]] ; then echo DOS; fi

Conversely:

if [[ -n $(unix2dos --info=c "${filename}") ]] ; then echo UNIX; fi


回答4:

I can't test on AIX, but try:

if [[ "$(head -1 filename)" == *$'\r' ]]; then echo DOS; fi


回答5:

You can simply remove any existing carriage returns from all lines, and then add the carriage return to the end of all lines. Then it doesn't matter what format the incoming file is in. The outgoing format will always be DOS format.

sed 's/\r$//;s/$/\r/'


回答6:

I'm probably late on this one, but I've had the same issue and I did not want to put the special ^M character in my script (I'm worried some editors might not display the special character properly or some later programmer might replace it by 2 normal characters: ^ and M...).

The solution I found feeds the special character to grep, by letting the shell convert its hex value:

if head -1 ${filename} | grep $'[\x0D]' >/dev/null
then
  echo "Win"
else
  echo "Unix"
fi

unfortunately I cannot make the $'[\x0D]' construct work in ksh. In ksh, I found this: if head -1 ${filename} | od -x | grep '0d0a$' >/dev/null then echo "Win" else echo "Unix" fi

od -x displays the text in hex codes. '0d0a$' is the hex code for CR-LF (the DOS-Win line terminator). The Unix line terminator is '0a00$'