Difference of Two Arrays Using Perl

2020-01-23 07:55发布

问题:

I have two arrays. I need to check and see if the elements of one appear in the other one.

Is there a more efficient way to do it than nested loops? I have a few thousand elements in each and need to run the program frequently.

回答1:

Another way to do it is to use Array::Utils

use Array::Utils qw(:all);

my @a = qw( a b c d );
my @b = qw( c d e f );

# symmetric difference
my @diff = array_diff(@a, @b);

# intersection
my @isect = intersect(@a, @b);

# unique union
my @unique = unique(@a, @b);

# check if arrays contain same members
if ( !array_diff(@a, @b) ) {
        # do something
}

# get items from array @a that are not in array @b
my @minus = array_minus( @a, @b );


回答2:

perlfaq4 to the rescue:

How do I compute the difference of two arrays? How do I compute the intersection of two arrays?

Use a hash. Here's code to do both and more. It assumes that each element is unique in a given array:

   @union = @intersection = @difference = ();
    %count = ();
    foreach $element (@array1, @array2) { $count{$element}++ }
    foreach $element (keys %count) {
            push @union, $element;
            push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
    }

If you properly declare your variables, the code looks more like the following:

my %count;
for my $element (@array1, @array2) { $count{$element}++ }

my ( @union, @intersection, @difference );
for my $element (keys %count) {
    push @union, $element;
    push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
}


回答3:

You need to provide a lot more context. There are more efficient ways of doing that ranging from:

  • Go outside of Perl and use shell (sort + comm)

  • map one array into a Perl hash and then loop over the other one checking hash membership. This has linear complexity ("M+N" - basically loop over each array once) as opposed to nested loop which has "M*N" complexity)

    Example:

    my %second = map {$_=>1} @second;
    my @only_in_first = grep { !$second{$_} } @first; 
    # use a foreach loop with `last` instead of "grep" 
    # if you only want yes/no answer instead of full list
    
  • Use a Perl module that does the last bullet point for you (List::Compare was mentioned in comments)

  • Do it based on timestamps of when elements were added if the volume is very large and you need to re-compare often. A few thousand elements is not really big enough, but I recently had to diff 100k sized lists.



回答4:

You can try Arrays::Utils, and it makes it look nice and simple, but it's not doing any powerful magic on the back end. Here's the array_diffs code:

sub array_diff(\@\@) {
    my %e = map { $_ => undef } @{$_[1]};
    return @{[ ( grep { (exists $e{$_}) ? ( delete $e{$_} ) : ( 1 ) } @{ $_[0] } ), keys %e ] };
}

Since Arrays::Utils isn't a standard module, you need to ask yourself if it's worth the effort to install and maintain this module. Otherwise, it's pretty close to DVK's answer.

There are certain things you must watch out for, and you have to define what you want to do in that particular case. Let's say:

@array1 = qw(1 1 2 2 3 3 4 4 5 5);
@array2 = qw(1 2 3 4 5);

Are these arrays the same? Or, are they different? They have the same values, but there are duplicates in @array1 and not @array2.

What about this?

@array1 = qw( 1 1 2 3 4 5 );
@array2 = qw( 1 1 2 3 4 5 );

I would say that these arrays are the same, but Array::Utils::arrays_diff begs to differ. This is because Array::Utils assumes that there are no duplicate entries.

And, even the Perl FAQ pointed out by mob also says that It assumes that each element is unique in a given array. Is this an assumption you can make?

No matter what, hashes are the answer. It's easy and quick to look up a hash. The problem is what do you want to do with unique values.

Here's a solid solution that assumes duplicates don't matter:

sub array_diff {
    my @array1 = @{ shift() };
    my @array2 = @{ shift() }; 

    my %array1_hash;
    my %array2_hash;

    # Create a hash entry for each element in @array1
    for my $element ( @array1 ) {
       $array1_hash{$element} = @array1;
    }

    # Same for @array2: This time, use map instead of a loop
    map { $array_2{$_} = 1 } @array2;

    for my $entry ( @array2 ) {
        if ( not $array1_hash{$entry} ) {
            return 1;  #Entry in @array2 but not @array1: Differ
        }
    }
    if ( keys %array_hash1 != keys %array_hash2 ) {
       return 1;   #Arrays differ
    }
    else {
       return 0;   #Arrays contain the same elements
    }
}

If duplicates do matter, you'll need a way to count them. Here's using map not just to create a hash keyed by each element in the array, but also count the duplicates in the array:

my %array1_hash;
my %array2_hash;
map { $array1_hash{$_} += 1 } @array1;
map { $array2_hash{$_} += 2 } @array2;

Now, you can go through each hash and verify that not only do the keys exist, but that their entries match

for my $key ( keys %array1_hash ) {
    if ( not exists $array2_hash{$key} 
       or $array1_hash{$key} != $array2_hash{$key} ) {
       return 1;   #Arrays differ
    }
 }

You will only exit the for loop if all of the entries in %array1_hash match their corresponding entries in %array2_hash. Now, you have to show that all of the entries in %array2_hash also match their entries in %array1_hash, and that %array2_hash doesn't have more entries. Fortunately, we can do what we did before:

if ( keys %array2_hash != keys %array1_hash ) {
     return 1;  #Arrays have a different number of keys: Don't match
}
else {
     return;    #Arrays have the same keys: They do match
}


回答5:

You can use this for getting diffrence between two arrays

#!/usr/bin/perl -w
use strict;

my @list1 = (1, 2, 3, 4, 5);
my @list2 = (2, 3, 4);

my %diff;

@diff{ @list1 } = undef;
delete @diff{ @list2 };


回答6:

n + n log n algorithm, if sure that elements are unique in each array (as hash keys)

my %count = (); 
foreach my $element (@array1, @array2) { 
    $count{$element}++;
}
my @difference = grep { $count{$_} == 1 } keys %count;
my @intersect  = grep { $count{$_} == 2 } keys %count;
my @union      = keys %count;

So if I'm not sure of unity and want to check presence of the elements of array1 inside array2,

my %count = (); 
foreach (@array1) {
    $count{$_} = 1 ;
};
foreach (@array2) {
    $count{$_} = 2 if $count{$_};
};
# N log N
if (grep { $_ == 1 } values %count) {
    return 'Some element of array1 does not appears in array2'
} else {
    return 'All elements of array1 are in array2'.
} 
# N + N log N


回答7:

my @a = (1,2,3); 
my @b=(2,3,1); 
print "Equal" if grep { $_ ~~ @b } @a == @b;


回答8:

Try to use List:Compare . IT has solutions for all the operations that can be performed on arrays. https://metacpan.org/pod/List::Compare



回答9:

You want to compare each element of @x against the element of the same index in @y, right? This will do it.

print "Index: $_ => \@x: $x[$_], \@y: $y[$_]\n" 
    for grep { $x[$_] != $y[$_] } 0 .. $#x;

...or...

foreach( 0 .. $#x ) {
    print "Index: $_ => \@x: $x[$_], \@y: $y[$_]\n" if $x[$_] != $y[$_];
}

Which you choose kind of depends on whether you're more interested in keeping a list of indices to the dissimilar elements, or simply interested in processing the mismatches one by one. The grep version is handy for getting the list of mismatches. (original post)



回答10:

Not elegant, but easy to understand:

#!/usr/local/bin/perl 
use strict;
my $file1 = shift or die("need file1");
my $file2 = shift or die("need file2");;
my @file1lines = split/\n/,`cat $file1`;
my @file2lines = split/\n/,`cat $file2`;
my %lines;
foreach my $file1line(@file1lines){
    $lines{$file1line}+=1;
}
foreach my $file2line(@file2lines){
    $lines{$file2line}+=2;
}
while(my($key,$value)=each%lines){
    if($value == 1){
        print "$key is in only $file1\n";
    }elsif($value == 2){
        print "$key is in only $file2\n";
    }elsif($value == 3){
        print "$key is in both $file1 and $file2\n";
    }
}
exit;
__END__


标签: perl arrays