I found myself writing PERL for the first time in about 8 years and I am having difficulties with something that should be easy. Here is the basic premise:
A file containing a hundred or so fields 10 of which have incorrect data (the O's are 0's)
A B C D E F ...
br0wn red 1278076 0range "20 tr0ut" 123 ...
Green 0range 90876 Yell0w "18 Salm0n" 456 ...
I am trying to write the program to split the fields and then allow me to run a regex on field A to replace 0 with O but not replace 0 with O for column C and so on I have the additional problem of needing to possibly run an alternate regex for column E for instance.
I was able to split all the fields in a record by the /t. I am having an issue formatting my command to go over each field and run a specific regex based on the field it is.
Any help would be appreciated and I will Paypal you 10 dollars for a beverage of your choice if you solve it.
I'd probably use Perl in 'autosplit' mode:
The regex for
$F[4]
changes '20 tr0ut' into '20 trout'; you can make it more complex if you need.Output on sample data:
This does assume a strictly tab-separated data file. The quoted strings containing spaces complicate things if you do not have strictly tab-separated data; at that point, Text::CSV is attractive for reading the lines.
Here's one way with a simple configuration using array references and/or subroutines, then the substitutions happening later:
Using a csv parser such as
Text::CSV
is not complicated. Something like this might suffice:Output:
Note that I handled your mixed string (column E) with a simplistic regex instead of transliteration (global replace), and it simply does not replace zeroes which are next to numbers, which will fail for certain numbers, such as
20.0
or0
.Update:
If you want to do the substitutions based on column names instead of position, things get a bit more complicated. However,
Text::CSV
can handle it.This code is a standalone for demonstration. To try the code on a file, change
*DATA
to*STDIN
and use the script as follows:Here's one way using
GNU awk
. Simply add the column names into the array in theBEGIN
block. In the example below, only columns A, C and E will be modified. Run like:Contents of
script.awk
:Tab separated results:
Alternatively, here's the one-liner:
Create an array of subroutines, something like:
Tested below