How to match exactly two empty lines

2020-07-30 03:46发布

问题:

I have a question about regular expressions. I have a file and I need to parse it in such a way that I could distinguish some specific blocks of text in it. These blocks of text are separated by two empty lines (there are blocks which are separated by 3 or 1 empty lines but I need exactly 2). So I have a piece of code and this is \s*$^\s*$/ regular expression I think should match, but it does not. What is wrong?

$filename="yu";
open($in,$filename);
open(OUT,">>out.text");
while($str=<$in>)
{
unless($str = /^\s*$^\s*$/){
print "yes";
print OUT $str;
}
}
close($in);
close(OUT);

Cheers, Yuliya

回答1:

New Answer

After having problems excluding >2 empty lines, and a good nights sleep here is a better method that doesn't even need to slurp.

#!/usr/bin/perl

use strict;
use warnings;    

my $file = 'yu';
my @blocks; #each element will be an arrayref, one per block
            #that referenced array will hold lines in that block

open(my $fh, '<', $file);

my $empty = 0;
my $block_num = 0;
while (my $line = <$fh>) {
  chomp($line);
  if ($line =~ /^\s*$/) {
    $empty++;
  } elsif ($empty == 2) { #not blank and exactly 2 previous blanks
    $block_num++; # move on to next block
    $empty = 0;
  } else {
    $empty = 0;
  }

  push @{ $blocks[$block_num] }, $line;
}

#write out each block to a new file
my $file_num = 1;
foreach my $block (@blocks) {
  open(my $out, '>', $file_num++ . ".txt");
  print $out join("\n", @$block);
}

In fact rather than store and write later, you could simply write to one file per block as you go:

#!/usr/bin/perl

use strict;
use warnings;

my $file = 'yu';

open(my $fh, '<', $file);

my $empty = 0;
my $block_num = 1;
open(OUT, '>', $block_num . '.txt');
while (my $line = <$fh>) {
  chomp($line);
  if ($line =~ /^\s*$/) {
    $empty++;
  } elsif ($empty == 2) { #not blank and exactly 2 previous blanks
    close(OUT); #just learned this line isn't necessary, perldoc -f close
    open(OUT, '>', ++$block_num . '.txt');
    $empty = 0;
  } else {
    $empty = 0;
  }

  print OUT "$line\n";
}

close(OUT);


回答2:

By default, Perl reads files a line at a time, so you won't see multiple new lines. The following code selects text terminated by a double new line.

    local $/ = "\n\n" ;

    while (<> ) {

      print "-- found $_" ;
    }


回答3:

use 5.012;

open my $fh,'<','1.txt';

#slurping file
local $/;
my $content = <$fh>;

close $fh;

for my $block ( split /(?<!\n)\n\n\n(?!\n)/,$content ) {
    say 'found:';
    say $block;
}


回答4:

Deprecated in favor of new answer

justintime's answer works by telling perl that you want to call the end of a line "\n\n", which is clever and will work well. One exception is that this must match exactly. By the regex you are using it makes it seem like there might be whitespace on the "empty" lines, in which case this will not work. Also his method will split even on more than 2 linebreaks, which was not allowed in the OP.

For completeness, to do it the way you were asking, you need to slurp the whole file into a variable (if the file is not so large as to use all your memory, probably fine in most cases).

I would then probably say to use the split function to split the block of text into an array of chunks. Your code would then look something like:

#!/usr/bin/perl

use strict;
use warnings;

my $file = 'yu';
my $text;

open(my $fh, '<', $file);
{
  local $/; enables slurp mode inside this block
  $text = <$fh>;
}
close($fh);

my @blocks = split( 
  /
  (?<!\n)\n #check to make sure there isn't another \n behind this one
  \s*\n #first whitespace only line
  \s*\n #second "
  (?!\n) #check to make sure there isn't another \n after this one
  /x, # x flag allows comments and whitespace in regex
  $text
);  

You can then do operations on the array. If I understand your comment to justintime's answer, you want to write each block out to a different file. That would look something like

my $file_num = 1;
foreach my $block (@blocks) {
  open(my $out, '>', $file_num++ . ".txt");
  print $out $block;
}

Notice that since you open $out lexically (with my) when it reaches the end of the foreach block, the $out variable dies (i.e. "goes out of scope"). When this happens to a lexical filehandle, the file is automatically closed. And you can do a similar thing to that with justintime's method as well:

local $/ = "\n\n" ;

my $file_num = 1;
while (<>) {
  open(my $out, '>', $file_num++ . ".txt");
  print $out $block;
}