What's the best way of doing dos2unix on a 500

2019-01-25 14:43发布

问题:

Question says it all, I've got a 500,000 line file that gets generated as part of an automated build process on a Windows box and it's riddled with ^M's. When it goes out the door it needs to *nix friendly, what's the best approach here, is there a handy snippet of code that could do this for me? Or do I need to write a little C# or Java app?

回答1:

Here is a Perl one-liner, taken from http://www.technocage.com/~caskey/dos2unix/

#!/usr/bin/perl -pi
s/\r\n/\n/;

You can run it as follows:

perl dos2unix.pl < file.dos > file.unix

Or, you can run it also in this way (the conversion is done in-place):

perl -pi dos2unix.pl file.dos

And here is my (naive) C version:

#include <stdio.h>

int main(void)
{
   int c;
   while( (c = fgetc(stdin)) != EOF )
      if(c != '\r')
         fputc(c, stdout);
   return 0;
}

You should run it with input and output redirection:

dos2unix.exe < file.dos > file.unix


回答2:

If installing a base cygwin is too heavy, there are a number of standalone dos2unix and unix2dos Windows standalone console-based programs on the net, many with C/C++ source available. If I'm understanding the requirement correctly, either of these solutions would fit nicely into an automated build script.



回答3:

If you're on Windows and need something run in a batch script, you can compile a simple C program to do the trick.

#include <stdio.h>

int main() {
    while(1) {
        int c = fgetc(stdin);

        if(c == EOF)
            break;

        if(c == '\r')
            continue;

        fputc(c, stdout);
    }

    return 0;
}

Usage:

myprogram.exe < input > output

Editing in-place would be a bit more difficult. Besides, you may want to keep backups of the originals for some reason (in case you accidentally strip a binary file, for example).

That version removes all CR characters; if you only want to remove the ones that are in a CR-LF pair, you can use (this is the classic one-character-back method :-):

/* XXX Contains a bug -- see comments XXX */

#include <stdio.h>

int main() {
    int lastc = EOF;
    int c;
    while ((c = fgetc(stdin)) != EOF) {
        if ((lastc != '\r') || (c != '\n')) {
            fputc (lastc, stdout);
        }
        lastc = c;
    }
    fputc (lastc, stdout);
    return 0;
}

You can edit the file in-place using mode "r+". Below is a general myd2u program, which accepts file names as arguments. NOTE: This program uses ftruncate to chop off extra characters at the end. If there's any better (standard) way to do this, please edit or comment. Thanks!

#include <stdio.h>

int main(int argc, char **argv) {
    FILE *file;

    if(argc < 2) {
        fprintf(stderr, "Usage: myd2u <files>\n");
        return 1;
    }

    file = fopen(argv[1], "rb+");

    if(!file) {
        perror("");
        return 2;
    }

    long readPos = 0, writePos = 0;
    int lastC = EOF;

    while(1) {
        fseek(file, readPos, SEEK_SET);
        int c = fgetc(file);
        readPos = ftell(file);  /* For good measure. */

        if(c == EOF)
            break;

        if(c == '\n' && lastC == '\r') {
            /* Move back so we override the \r with the \n. */
            --writePos;
        }

        fseek(file, writePos, SEEK_SET);
        fputc(c, file);
        writePos = ftell(file);

        lastC = c;
    }

    ftruncate(fileno(file), writePos); /* Not in C89/C99/ANSI! */

    fclose(file);

    /* 'cus I'm too lazy to make a loop. */
    if(argc > 2)
        main(argc - 1, argv - 1);

    return 0;
}


回答4:

tr -d '^M' < infile > outfile

You will type ^M as : ctrl+V , Enter

Edit: You can use '\r' instead of manually entering a carriage return, [thanks to @strager]

tr -d '\r' < infile > outfile

Edit 2: 'tr' is a unix utility, you can download a native windows version from http://unxutils.sourceforge.net[thanks to @Rob Kennedy] or use cygwin's unix emulation.



回答5:

Ftp it from the dos box, to the unix box, as an ascii file, instead of a binary file. Ftp will strip the crlf, and insert a lf. Transfer it back to the dos box as a binary file, and the lf will be retained.



回答6:

Some text editors, such as UltraEdit/UEStudio have this functionality built-in.

File > Conversions > DOS to UNIX



回答7:

If it is just one file I use notepad++. Nice because it is free. I have cygwin installed and use a one liner script I wrote for multiple files. If your interest in the script leave a comment. (I don't have it available to me a this moment.)