Does anyone know any unix commands/perl script that would insert a specific character (that can be entered as either hex (ie 7C) or as the actual character (ie |)) in the position of the nth recurring occurence of a specific character.
ie perl script.pl "," 3 "|" data.txt
would replace every 3rd,6th,9th...etc comma with a pipe.
So if data.txt had the following before the script was run:
fd,3232,gfd67gf,
peas,989767,jkdfnfgjhf,
dhdhjsk,267,ujfdsy,fuyds,637296,ldosi,fduy,
873,fuisouyd,try
save,2837,ipoi
It should then have this after the script was run:
fd,3232,gfd67gf|
peas,989767,jkdfnfgjhf|
dhdhjsk,267,ujfdsy|fuyds,637296,ldosi|fduy,
873,fuisouyd|try
save,2837,ipoi
Small perl hack to solve the problem. Using the index
function to find the commas, modulus to replace the right one, and substr
to perform the replacement.
use strict;
use warnings;
while (<>) {
my $x=index($_,",");
my $i = 0;
while ($x != -1) {
$i++;
unless ($i % 3) {
$_ = substr($_,0,$x) ."|". substr($_,$x+1);
}
$x = index($_,",",$x + 1)
}
print;
}
Run with perl script.pl file.csv
.
Note: You can place the declaration my $i
before the while(<>)
loop in order to do a global count, instead of a separate count for each line. Not quite sure I understood your question in that regard.
use File::Slurp qw(read_file);
my ($from, $to, $every, $fname) = @ARGV;
my $counter = 0;
my $in = read_file $fname;
my $out = $in;
# copy is important because pos magic attached to $in resets with substr
while ($in =~ /\Q$from/gms) {
$counter++;
substr $out, pos($in)-1, length($from), $to unless $counter % $every;
};
print $out;
If the $from
and $to
parameters have different length, you still need to mess a bit with the second parameter of substr
to make it work correctly.
How about a nice, simple awk
one-liner?
awk -v RS=, '{ORS=(++i%3?",":"|");print}' file.csv
One minor bug just occurred to me: it will print a ,
or |
as the very last character. To avoid this, we need to alter it slightly:
awk -v RS=, '{ORS=(++i%3?",":"|");print}END{print ""}' file.csv | sed '$d'
# Get params and create part of the regex.
my $delim = "\\" . shift;
my $n = shift;
my $repl = shift;
my $wild = '.*?';
my $pattern = ($wild . $delim) x ($n - 1);
# Slurp.
$/ = undef;
my $text = <>;
# Replace and print.
$text =~ s/($pattern$wild)$delim/$1$repl/sg;
print $text;
This processes the input file one line at a time (no slurping :)
For hex input, just pass '\x7C'
or whatever, as $1
#!/bin/bash
b="${1:-,}" # the "before" field delimiter
n="${2:-3}" # the number of fields in a group
a="${3:-|}"; [[ $a == [\|] ]] && a='\|' # the "after" group delimiter
sed -nr "x;G; /(([^$b]+$b){$((n-1))}[^$b]+)$b/{s//\1$a/g}
s/.*\n//; h; /.*$a/{s///; x}; p" input_file
Here it is again, with some comments.
sed -nr "x;G # pat = hold + pat
/(([^$b]+$b){$((n-1))}[^$b]+)$b/{s//\1$a/g}
s/.*\n// # del fields from prev line
h # hold = mod*\n
/.*$a/{ s/// # pat = unmodified
x # hold = unmodified, pat = mod*\n
}
p # print line" input_file
I have an idea in bash script :
perl -pe 's/,/(++$n % 3 == 0) ? "|" : $&/ge' data.txt
That will do the trick.