可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I'm trying to compare 4 text files for counts in each line:
file1.txt:
32
44
75
22
88
file2.txt
32
44
75
22
88
file3.txt
11
44
75
22
77
file4.txt
32
44
75
22
88
each line represents a title
line1 = customerID count
line2 = employeeID count
line3 = active_users
line4 = inactive_users
line5 = deleted_users
I'm trying to compare file2.txt, file3.txt and file4.txt with file1.txt; file1.txt will always have the correct counts.
Example: Since file2.txt matches exactly line by line to file1.txt in the example above then i'm trying to output "file2.txt is good" but since file3.txt line1 and line5 do not match to file1.txt I'm trying to output "customerID for file3.txt does not match by 21 records", (i.e. 32 - 11 = 21), and "deleted_users in file3.txt does not match by 11 records", (88 - 77 = 11).
If shell is easier then that is fine too.
回答1:
One way to process files by lines in parallel
use warnings;
use strict;
use feature 'say';
my @files = @ARGV;
#my @files = map { $_ . '.txt' } qw(f1 f2 f3 f4); # my test files' names
# Open all files, filehandles in @fhs
my @fhs = map { open my $fh, '<', $_ or die "Can't open $_: $!"; $fh } @files;
# For reporting, enumerate file names
my %files = map { $_ => $files[$_] } 0..$#files;
# Process (compare) the same line from all files
my $line_cnt;
LINE: while ( my @line = map { my $line = <$_>; $line } @fhs )
{
defined || last LINE for @line;
++$line_cnt;
s/(?:^\s+|\s+$)//g for @line;
for my $i (1..$#line) {
if ($line[0] != $line[$i]) {
say "File $files[$i] differs at line $line_cnt";
}
}
}
This compares the whole line by ==
(after leading and trailing spaces are stripped), since it is a given that each line carries a single number which need be compared.
It prints, with my test files named f1.txt
, f2.txt
, ...
File f3.txt differs at line 1
File f3.txt differs at line 5
回答2:
Store the line names in an array, store the correct values in another array. Then, loop over the files, and for each of them, read their lines and compare them to the stored correct values. You can use the special variable $.
that contains the line number of the last access file handle to serve as an index to the arrays. Lines are 1-based, arrays are 0-based, so we need to subtract 1 to get the correct index.
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my @line_names = ('customerID count',
'employeeID count',
'active_users',
'inactive_users',
'deleted_users');
my @correct;
open my $in, '<', shift or die $!;
while (<$in>) {
chomp;
push @correct, $_;
}
while (my $file = shift) {
open my $in, '<', $file or die $!;
while (<$in>) {
chomp;
if ($_ != $correct[$. - 1]) {
say "$line_names[$. - 1] in $file does not match by ",
$correct[$. - 1] - $_, ' records';
}
}
}
回答3:
Read first file into array then loop over other files using the same function to read into array. Within this loop consider every line, calc diff and print message with text from @names if diff is not zero.
#!/usr/bin/perl
use strict;
use warnings;
my @names = qw(customerID_count employeeID_count active_users inactive_users deleted_users);
my @files = qw(file1.txt file2.txt file3.txt file4.txt);
my @first = readfile($files[0]);
for (my $i = 1; $i <= $#files; $i++) {
print "\n$files[0] <=> $files[$i]:\n";
my @second = readfile($files[$i]);
for (my $j = 0; $j <= $#names; $j++) {
my $diff = $first[$j] - $second[$j];
$diff = -$diff if $diff < 0;
if ($diff > 0) {
print "$names[$j] does not match by $diff records\n";
}
}
}
sub readfile {
my ($file) = @_;
open my $handle, '<', $file;
chomp(my @lines = <$handle>);
close $handle;
return grep(s/\s*//g, @lines);
}
Output is:
file1.txt <=> file2.txt:
file1.txt <=> file3.txt:
customerID_count does not match by 21 records
deleted_users does not match by 11 records
file1.txt <=> file4.txt:
回答4:
A mash-up of bash
, and mostly the GNU versions of standard utils like diff
, sdiff
, sed
, et al, plus the ifne
util, and even an eval
:
f=("" "customerID count" "employeeID count" \
"active_users" "inactive_users" "deleted_users")
for n in file{2..4}.txt ; do
diff -qws file1.txt $n ||
$(sdiff file1 $n | ifne -n exit | nl |
sed -n '/|/{s/[1-5]/${f[&]}/;s/\s*|\s*/-/;s/\([0-9-]*\)$/$((&))/;p}' |
xargs printf 'eval echo "%s for '"$n"' does not match by %s records.";\n') ;
done
Output:
Files file1.txt and file2.txt are identical
Files file1.txt and file3.txt differ
customerID count for file3.txt does not match by 21 records.
deleted_users for file3.txt does not match by 11 records.
Files file1.txt and file4.txt are identical
The same code, tweaked for prettier output:
f=("" "customerID count" "employeeID count" \
"active_users" "inactive_users" "deleted_users")
for n in file{2..4}.txt ; do
diff -qws file1.txt $n ||
$(sdiff file1 $n | ifne -n exit | nl |
sed -n '/|/{s/[1-5]/${f[&]}/;s/\s*|\s*/-/;s/\([0-9-]*\)$/$((&))/;p}' |
xargs printf 'eval echo "%s does not match by %s records.";\n') ;
done |
sed '/^Files/!s/^/\t/;/^Files/{s/.* and //;s/ are .*/ is good/;s/ differ$/:/}'
Output:
file2.txt is good
file3.txt:
customerID count does not match by 21 records.
deleted_users does not match by 11 records.
file4.txt is good
回答5:
Here is an example in Perl:
use feature qw(say);
use strict;
use warnings;
{
my $ref = read_file('file1.txt');
my $N = 3;
my @value_info;
for my $i (1..$N) {
my $fn = 'file'.($i+1).'.txt';
my $values = read_file( $fn );
push @value_info, [ $fn, $values];
}
my @labels = qw(customerID employeeID active_users inactive_users deleted_users);
for my $info (@value_info) {
my ( $fn, $values ) = @$info;
my $all_ok = 1;
my $j = 0;
for my $value (@$values) {
if ( $value != $ref->[$j] ) {
printf "%s: %s does not match by %d records\n",
$fn, $labels[$j], abs( $value - $ref->[$j] );
$all_ok = 0;
}
$j++;
}
say "$fn: is good" if $all_ok;
}
}
sub read_file {
my ( $fn ) = @_;
my @values;
open ( my $fh, '<', $fn ) or die "Could not open file '$fn': $!";
while( my $line = <$fh>) {
if ( $line =~ /(\d+)/) {
push @values, $1;
}
}
close $fh;
return \@values;
}
Output:
file2.txt: is good
file3.txt: customerID does not match by 21 records
file3.txt: deleted_users does not match by 11 records
file4.txt: is good