Using sed on text files with a csv

2019-08-10 05:42发布

I've been trying to do bulk find and replace on two text files using a csv. I've seen the questions that SO suggests, and none seem to answer my question.

I've created two variables for the two text files I want to modify. The csv has two columns and hundreds of rows. The first column contains strings (none have whitespaces) already in the text file that need to be replaced with the corresponding strings in same row in the second column.

As a test, I tried the script

#!/bin/bash

test1='long_file_name.txt'
find='string1'
replace='string2'

sed -e "s/$find/$replace/g" $test1 > $test1.tmp && mv $test1.tmp $test1

This was successful, except that I need to do it once for every row in the csv, using the values given by the csv in each row. My hunch is that my while loop was used wrongly, but I can't find the error. When I execute the script below, I get the command line prompt, which makes me think that something has happened. When I check the text files, nothing's changed.

The two text files, this script, and the csv are all in the same folder (it's also been my working directory when I do this).

#!/bin/bash

textfile1='long_file_name1.txt'
textfile2='long_file_name2.txt'

while IFS=, read f1 f2
do
    sed -e "s/$f1/$f2/g" $textfile1 > $textfile1.tmp && \
         mv $textfile1.tmp $textfile1
    sed -e "s/$f1/$f2/g" $textfile2 > $textfile2.tmp && \
         mv $textfile2.tmp $textfile2
done <'findreplace.csv'

It seems to me that this code should do what I want it to do (but doesn't); perhaps I'm misunderstanding something fundamental (I'm new to bash scripting)?

The csv looks like this, but with hundreds of rows. All a_i's should be replaced with their counterpart b_i in the next column over.

a_1 b_1
a_2 b_2
a_3 b_3

Something to note: All the strings actually contain underscores, just in case this affects something. I've tried wrapping the variable name in braces a la ${var}, but it still doesn't work.

I appreciate the solutions, but I'm also curious to know why the above doesn't work. (Also, I would vote everyone up, but I lack the reputation to do so. However, know that I appreciate and am learning a lot from your answers!)

2条回答
我只想做你的唯一
2楼-- · 2019-08-10 06:15

If you are going to process lot of data and your patterns can contain a special character I would consider using Perl. Especially if you are going to have a lot of pairs in findreplace.csv. You can use following script as filter or in-place modification with lot of files. As side effect, it will load replacements and create Aho-Corrasic automaton only once per invocation which will make this solution pretty efficient (O(M+N) instead of O(M*N) in your solution).

#!/usr/bin/perl
use strict;
use warnings;
use autodie;

my $in_place = ( @ARGV and $ARGV[0] =~ /^-i(.*)/ )
    ? do {
    shift;
    my $backup_extension = $1;
    my $backup_name      = $backup_extension =~ /\*/
        ? sub { ( my $fn = $backup_extension ) =~ s/\*/$_[0]/; $fn }
        : sub { shift . $backup_extension };
    my $oldargv = '-';
    sub {
        if ( $ARGV ne $oldargv ) {
            rename( $ARGV, $backup_name->($ARGV) );
            open( ARGVOUT, '>', $ARGV );
            select(ARGVOUT);
            $oldargv = $ARGV;
        }
    };
    }
    : sub { };

die "$0: File with replacements required." unless @ARGV;
my ( $re, %replace );
do {
    my $filename = shift;
    open my $fh, '<', $filename;
    %replace = map { chomp; split ',', $_, 2 } <$fh>;
    close $fh;
    $re = join '|', map quotemeta, keys %replace;
    $re = qr/($re)/;
};

while (<>) {
    $in_place->();
    s/$re/$replace{$1}/g;
}
continue {print}

Usage:

./replace.pl replace.csv <file.in >file.out

as well as

./replace.pl replace.csv file.in >file.out

or in-place

./replace.pl -i replace.csv file1.csv file2.csv file3.csv

or with backup

./replace.pl -i.orig replace.csv file1.csv file2.csv file3.csv

or with backup whit placeholder

./replace.pl -ithere.is.\*.original replace.csv file1.csv file2.csv file3.csv
查看更多
Rolldiameter
3楼-- · 2019-08-10 06:19

You should convert your CSV file to a sed.script with the following command:

cat replace.csv | awk -F, '{print "s/" $1 "/" $2 "/g";}' > sed.script

And then you will be able to do a one pass replacement:

sed -i -f sed.script longfilename.txt

This will be a faster implementation of what you wanna do.

BTW, sorry, but I do not understand what is wrong with your script which should work except if your CSV file has more than 2 columns.

查看更多
登录 后发表回答