File handle failed to read second time

2020-08-04 10:36发布

问题:

How can I read from a file handle for the second time inside foreach loop in Perl?

foreach $a (@b){
    while(my $line = <IN>){
        if($line = /$a/){
            print $line;
        }
    }
}

The above code is not processing the second element from the list @b. How to make it possible?

回答1:

Your inner loop, while(my $line = <IN>), extracts lines from the IN handle until it reaches the end of the file.

When your outer loop, foreach $a (@b), tries to read from IN again, it's still at end-of-file. The first iteration of the foreach loop consumes all lines from the file, leaving nothing for the other iterations.

There are several possible ways to fix this:

  • Seek back to the beginning of IN before you attempt to read from it again:

    foreach $a (@b){
        seek IN, 0, 0
            or die "Cannot seek(): $!";
        while (my $line = <IN>) {
            ...
        }
    }
    

    However, this only works for real files, not pipes or sockets or terminals.

  • Read the whole file into memory up front, then iterate over a normal array:

    my @lines = <IN>;
    foreach $a (@b){
        foreach my $line (@lines) {
            ...
        }
    }
    

    However, if the file is big, this will use a lot of memory.

  • Switch the order of the two loops:

    while (my $line = <IN>) {
        foreach $a (@b) {
            ...
        }
    }
    

    This is my favorite. Now you only need to read from the file once. @b is already in memory, so you can iterate over it as many times as you want.


Side notes:

  • Don't use bareword filehandles like IN. Normal variables (such as $IN) are pretty much superior in every way.
  • Don't use variables called $a or $b. They're a bit special because Perl uses them in sort.
  • My personal preference is to never use < >. It's weirdly overloaded (it can mean either readline or glob, depending on the exact syntax you use) and it isn't terribly intuitive. Using readline means there's never any syntactic ambiguity and even programmers with no Perl experience can figure out what it does.

With those changes:

while (my $line = readline $IN) {
    foreach my $re (@regexes) {
        if ($line =~ /$re/) {
            print $line;
        }
    }
}


回答2:

You read within a while loop until the filehandle is exhausted (reached EOF end of file). If you don't close and reopen your filehandle you won't read anymore from it in the second iteration of the outer loop.

If the amount of data you read from the filehandle is no so large you could read the file into an array variable and iterate over the content of the array variable.

For example:

my @filecontent = <IN>;
foreach $item_of_b (@b){
    foreach my $line_of_file (@filecontent){
        if($line_of_file =~ /$item_of_b/){
            print $line_of_file;
        }
    }
}

And $a and $b should not be used as variable names, they are special due to sorting.



标签: regex perl