The issue I have can be found by running the following code in Strawberry perl 5.12.3.0 on Windows XP.
#!/usr/bin/perl -w
use strict;
use warnings;
use Win32::Unicode::File;
use Encode;
my $fname = shift @ARGV;
my $fh = Win32::Unicode::File->new;
if ($fh->open('<', $fname)){
while (my $line = $fh->readline()){}
close $fh;
}else{
print "Couldn't open file: $!\n";
}
The only thing that is happening here is that I perform a readline and this keeps eating memory until I get an Out of memory error from Strawberry perl. I am using a really big file but since this code is stream based it shouldn't matter. Am I missing something here or is there a leak somewhere in Strawberry perl? I tested the exactly same code in ActivePerl and there it works fine, i.e., it doesn't eat memory.
Update: Replacing Win32::Unicode::File with the normal diamond operator seems to work on my distribution at least. See the following code.
use strict;
use warnings;
my $fname = shift @ARGV;
if (open(my $fh, '<', $fname)){
while (my $line = <$fh>){}
close $fh;
}else{ print "Couldn't open file: $!\n";}
So that would suggest the problem lies with Win32::Unicode module right?
A little unorthodox I guess, but I'm going to answer my own question. I have replaced the Win32::Unicode::File package with the Path::Class::Unicode package instead for reading the unicode file. This works fine (i.e. no memory eating) so it seems like the problem is in the Win32::Unicode::File package and is most likely a bug. I have contacted the author of the package and he's looking into it. Please let me know if you want me to supply the code. It's pretty straightforward.
Maybe $/ (or $INPUT_RECORD_SEPARATOR) is not a new line? Or $[ (index of first array element and first character in a (sub)string) is not 0.
Those two vars are used by the module during read or readline.
BTW: It's so damn slow because it uses 3 function calls to reads each line one character at a time and then calls Encode::decode for each read character and then adds it to the line buffer that readline returns to your code. Yuck!