I have a question about regular expressions. I have a file and I need to parse it in such a way that I could distinguish some specific blocks of text in it. These blocks of text are separated by two empty lines (there are blocks which are separated by 3 or 1 empty lines but I need exactly 2). So I have a piece of code and this is \s*$^\s*$/
regular expression I think should match, but it does not.
What is wrong?
$filename="yu";
open($in,$filename);
open(OUT,">>out.text");
while($str=<$in>)
{
unless($str = /^\s*$^\s*$/){
print "yes";
print OUT $str;
}
}
close($in);
close(OUT);
Cheers,
Yuliya
New Answer
After having problems excluding >2 empty lines, and a good nights sleep here is a better method that doesn't even need to slurp.
#!/usr/bin/perl
use strict;
use warnings;
my $file = 'yu';
my @blocks; #each element will be an arrayref, one per block
#that referenced array will hold lines in that block
open(my $fh, '<', $file);
my $empty = 0;
my $block_num = 0;
while (my $line = <$fh>) {
chomp($line);
if ($line =~ /^\s*$/) {
$empty++;
} elsif ($empty == 2) { #not blank and exactly 2 previous blanks
$block_num++; # move on to next block
$empty = 0;
} else {
$empty = 0;
}
push @{ $blocks[$block_num] }, $line;
}
#write out each block to a new file
my $file_num = 1;
foreach my $block (@blocks) {
open(my $out, '>', $file_num++ . ".txt");
print $out join("\n", @$block);
}
In fact rather than store and write later, you could simply write to one file per block as you go:
#!/usr/bin/perl
use strict;
use warnings;
my $file = 'yu';
open(my $fh, '<', $file);
my $empty = 0;
my $block_num = 1;
open(OUT, '>', $block_num . '.txt');
while (my $line = <$fh>) {
chomp($line);
if ($line =~ /^\s*$/) {
$empty++;
} elsif ($empty == 2) { #not blank and exactly 2 previous blanks
close(OUT); #just learned this line isn't necessary, perldoc -f close
open(OUT, '>', ++$block_num . '.txt');
$empty = 0;
} else {
$empty = 0;
}
print OUT "$line\n";
}
close(OUT);
By default, Perl reads files a line at a time, so you won't see multiple new lines. The following code selects text terminated by a double new line.
local $/ = "\n\n" ;
while (<> ) {
print "-- found $_" ;
}
use 5.012;
open my $fh,'<','1.txt';
#slurping file
local $/;
my $content = <$fh>;
close $fh;
for my $block ( split /(?<!\n)\n\n\n(?!\n)/,$content ) {
say 'found:';
say $block;
}
Deprecated in favor of new answer
justintime's answer works by telling perl that you want to call the end of a line "\n\n", which is clever and will work well. One exception is that this must match exactly. By the regex you are using it makes it seem like there might be whitespace on the "empty" lines, in which case this will not work. Also his method will split even on more than 2 linebreaks, which was not allowed in the OP.
For completeness, to do it the way you were asking, you need to slurp the whole file into a variable (if the file is not so large as to use all your memory, probably fine in most cases).
I would then probably say to use the split
function to split the block of text into an array of chunks. Your code would then look something like:
#!/usr/bin/perl
use strict;
use warnings;
my $file = 'yu';
my $text;
open(my $fh, '<', $file);
{
local $/; enables slurp mode inside this block
$text = <$fh>;
}
close($fh);
my @blocks = split(
/
(?<!\n)\n #check to make sure there isn't another \n behind this one
\s*\n #first whitespace only line
\s*\n #second "
(?!\n) #check to make sure there isn't another \n after this one
/x, # x flag allows comments and whitespace in regex
$text
);
You can then do operations on the array. If I understand your comment to justintime's answer, you want to write each block out to a different file. That would look something like
my $file_num = 1;
foreach my $block (@blocks) {
open(my $out, '>', $file_num++ . ".txt");
print $out $block;
}
Notice that since you open $out lexically (with my
) when it reaches the end of the foreach block, the $out variable dies (i.e. "goes out of scope"). When this happens to a lexical filehandle, the file is automatically closed. And you can do a similar thing to that with justintime's method as well:
local $/ = "\n\n" ;
my $file_num = 1;
while (<>) {
open(my $out, '>', $file_num++ . ".txt");
print $out $block;
}