Perl iterating through each line in a file and app

2019-06-27 12:22发布

问题:

I have two text files containing the following:

FILE1.txt

dog
cat
antelope

FILE2.txt

1
2
Barry

The output I want to achieve is as follows:

dog1
dog2
dogBarry
cat1
cat2
catBarry
antelope1
antelope2
antelopeBarry

They way I have gone about it:

    open (FILE1, "<File1.txt") || die $!;
    open (FILE2, "<File2.txt") || die $!;

    my @animals = (<FILE1>);  #each line of the file into an array
    my @otherStrings = (<FILE2>);   #each line of the file into an array

    close FILE1 || die $!;
    close FILE2 || die $!;

    my @bothTogether;
    foreach my $animal (@animals) {
    chomp $animal;
            foreach my $otherString (@otherStrings) {
                    chomp $otherString;
                    push (@bothTogether,  "$animal$otherString");
            }
   }
   print @bothTogether; 

The way I have done it works, but I'm sure it is not the best way of going about it especially when the files could both contain thousands of lines?

What would the best way of doing this be, to maybe use a hash?

回答1:

Your approach will work fine for files with thousands of lines. That really isn't that big. For millions of lines, it might be a problem.

However, you could reduce the memory usage of your code by only reading one file into memory, as well as printing the results immediately instead of storing them in an array:

use warnings;
use strict;

open my $animals, '<', 'File1.txt' or die "Can't open animals: $!";
open my $payloads, '<', 'File2.txt' or die "Can't open payloads: $!";

my @payloads = <$payloads>;   #each line of the file into an array
close $payloads or die "Can't close payloads: $!";

while (my $line = <$animals>) {
    chomp $line;
    print $line.$_ foreach (@payloads);
}
close $animals or die "Can't close animals: $!";

With two huge files of equal size, this will use roughly 1/4 the memory of your original code.

Update: I also edited the code to include Simbabque's good suggestions for modernizing it.

Update 2: As others have noted, you could read neither file into memory, going through the payloads file line by line on each line of the animals file. However, that would be much slower. It should be avoided unless absolutely necessary. The approach I have suggested will be about the same speed as your original code.



回答2:

Besides certain Modern Perl aspects (two-argument open for example) your code is pretty straight forward.

The only improvement I can see is that you could move the inner chomp into an extra loop, maybe do the chomping while you read the file. That would save some time. But all in all, if you want to do something with data for each row of some other data, you are doing it right.

You should use or die instead of || die because of precedence, and the final output will be a long line because there are no more linebreaks in the array's items.

Update: @FrankB made a good suggestion in his above comment: If your files are huge and you are struggling with memory you should not slurp them in and put them in the two arrays, but rather read and process the first one line by line, and open and read the second one for each of these first one's lines. That takes a lot longer, but saves up a ton of memory. You would then output the results directly as well instead of pushing them in your results array.