Perform a different regular expression for each co

I found myself writing PERL for the first time in about 8 years and I am having difficulties with something that should be easy. Here is the basic premise:

A file containing a hundred or so fields 10 of which have incorrect data (the O's are 0's)

A   B   C   D    E  F   ... 
br0wn   red   1278076   0range   "20 tr0ut"   123 ...
Green   0range   90876   Yell0w   "18 Salm0n"   456   ...

I am trying to write the program to split the fields and then allow me to run a regex on field A to replace 0 with O but not replace 0 with O for column C and so on I have the additional problem of needing to possibly run an alternate regex for column E for instance.

I was able to split all the fields in a record by the /t. I am having an issue formatting my command to go over each field and run a specific regex based on the field it is.

Any help would be appreciated and I will Paypal you 10 dollars for a beverage of your choice if you solve it.

标签： linux perl bash unix

6条回答

劳资没心，怎么记你

2楼-- · 2019-08-12 20:35

I'd probably use Perl in 'autosplit' mode:

perl -a -p -F"\t" \
     -e '$F[0] =~ s/0/o/g;
         $F[1] =~ s/0/O/g;
         $F[3] =~ s/0/o/g;
         $F[4] =~ s/(\D)0(\D)/\1o\2/g;  # Or other more complex regex
         # ...                          # Other fields can be edited
         $_ = join("\t", @F);           # Reassign fields to $_
        ' data-file

The regex for $F[4] changes '20 tr0ut' into '20 trout'; you can make it more complex if you need.

Output on sample data:

A       B       C       D       E       F       ...
brown   red     1278076 orange  "20 trout"      123     ...
Green   Orange  90876   Yellow  "18 Salmon"     456     ...

This does assume a strictly tab-separated data file. The quoted strings containing spaces complicate things if you do not have strictly tab-separated data; at that point, Text::CSV is attractive for reading the lines.

0人赞添加讨论(0) 举报

来，给爷笑一个

3楼-- · 2019-08-12 20:42

Here's one way with a simple configuration using array references and/or subroutines, then the substitutions happening later:

use strict;
use warnings;

my @subst = ([
  ['this', 'that'],
  ['O', 1],
],[
  ['foo', 'boo'],
  sub {s/a.*//},
]);

sub mk_subst {
  my $list = shift;
  my ($this, $that) = eval { @$list };
  return $list unless defined $this;
  sub { s/\Q$this/$that/ };
}

my @all;
for my $set (@subst) {
  my @list = eval { @$set };
  unless (@list) {
    push @all, [ sub {} ];
    next;
  }
  my @re;
  for my $s (@list) {
    push @re, mk_subst($s);
  }
  push @all, \@re;
}

while (<DATA>) {
  chomp;
  my @list = split /\t/, $_, -1;
  for my $i (0..$#list) {
    for ($list[$i]) {
      for my $funcs ($all[$i]) {
        for my $f (@$funcs) {
          $f->();
        }
      }
    }
  }
  print join("\t", @list), "\n";
}

__DATA__
thisO   fooabca1234
abc 123fooabca1234

0人赞添加讨论(0) 举报

手持菜刀，她持情操

4楼-- · 2019-08-12 20:46

Using a csv parser such as Text::CSV is not complicated. Something like this might suffice:

use strict;
use warnings;
use Text::CSV;

my $csv = Text::CSV->new({
        sep_char    => "\t",
        binary      => 1,
        eol         => $/,
});
while (my $row = $csv->getline(*DATA)) {
    tr/0/o/ for @{$row}[0, 1, 3];            # replace in cols A, B and D
    s/(?<!\d)0(?!\d)/o/g for @{$row}[4];     # replace in col E
    $csv->print(*STDOUT, $row);              # print the result
}


__DATA__
A   B   C   D   E   F
br0wn   red 1278076 0range  "20 tr0ut"  123
Green   0range  90876   Yell0w  "18 Salm0n" 456

Output:

A       B       C       D       E       F
brown   red     1278076 orange  "20 trout"      123
Green   orange  90876   Yellow  "18 Salmon"     456

Note that I handled your mixed string (column E) with a simplistic regex instead of transliteration (global replace), and it simply does not replace zeroes which are next to numbers, which will fail for certain numbers, such as 20.0 or 0.

Update:

If you want to do the substitutions based on column names instead of position, things get a bit more complicated. However, Text::CSV can handle it.

use strict;
use warnings;
use Text::CSV;

my @pure_text   = qw(A B D);
my @mixed       = qw(E);

my $csv = Text::CSV->new({
        sep_char    => "\t",
        binary      => 1,
        eol     => $/,
});

my $cols = $csv->getline(*DATA);              # read column names
$csv->print(*STDOUT, $cols);
$csv->column_names($cols);                    # set column names

while (my $row = $csv->getline_hr(*DATA)) {   # hash ref instead of array ref
    tr/0/o/ for @{$row}{@pure_text};          # substitution on hash slice
    s/(?<!\d)0(?!\d)/o/g for @{$row}{@mixed};
    my @row = @{$row}{@$cols};                # make temp array for printing
    $csv->print(*STDOUT, \@row);
}


__DATA__
A   B   C   D   E   F
br0wn   red 1278076 0range  "20 tr0ut"  123
Green   0range  90876   Yell0w  "18 Salm0n" 456

This code is a standalone for demonstration. To try the code on a file, change *DATA to *STDIN and use the script as follows:

perl script.pl < input.csv

0人赞添加讨论(0) 举报

家丑人穷心不美

5楼-- · 2019-08-12 20:46

Here's one way using GNU awk. Simply add the column names into the array in the BEGIN block. In the example below, only columns A, C and E will be modified. Run like:

awk -f script.awk file

Contents of script.awk:

BEGIN {
    FS=OFS="\t"

    a["A"]
    a["C"]
    a["E"]
}

{
    for (i=1;i<=NF;i++) {

        if ($i in a && NR==1) {
            b[i]
        }

        else if (i in b) {
            $i = gensub(/(^|[^0-9])0([^0-9]|$)/,"\\1o\\2", "g", $i)
        }
    }
}1

Tab separated results:

A   B   C   D   E   F   ... 
brown   red 1278076 0range  "20 trout"  123 ...
Green   0range  90876   Yell0w  "18 Salmon" 456 ...

Alternatively, here's the one-liner:

awk 'BEGIN { FS=OFS="\t"; a["A"]; a["C"]; a["E"] } { for (i=1;i<=NF;i++) { if ($i in a && NR==1) b[i]; else if (i in b) $i = gensub(/(^|[^0-9])0([^0-9]|$)/,"\\1o\\2", "g", $i) } }1' file

0人赞添加讨论(0) 举报

贪生不怕死

6楼-- · 2019-08-12 20:50

Create an array of subroutines, something like:

my @fixer;
$fixer[0] = sub { $_[0] =~ s/0/o/; };
my @fields = split /\t/, $input;
for (my $i = 0; $i <= $#fields; $i++) {
   $fixer[$i]->($fields[$i]) if defined $fixer[$i];
}

0人赞添加讨论(0) 举报

forever°为你锁心

7楼-- · 2019-08-12 20:54

perl -F -lane 'for(@F){$_=~s/0/o/g if(/0/ && /[a-zA-Z]+/);} print "@F"' your_file

Tested below

> cat temp
br0wn   red   1278076   0range   "20 tr0ut"   123 ...
Green   0range   90876   Yell0w   "18 Salm0n"   456   ...

> perl -F -lane 'for(@F){$_=~s/0/o/g if(/0/ && /[a-zA-Z]+/);} print "@F"' temp
brown red 1278076 orange "20 trout" 123 ...
Green orange 90876 Yellow "18 Salmon" 456 ...
>

0人赞添加讨论(0) 举报

Perform a different regular expression for each co

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间