Preamble: I hate to ask questions like this, but I'm stuck with it and just learning Perl... seems like an easy task but I don't know where to look.
I have a folder with lots of xml-files that are all named ".xml".
I need to process those files in their numeric order, so "9123.xml" should come before "2384747.xml".
I have successfully sorted the list alphabetically with this:
opendir(XMLDIR,$xmldirname);
my @files = sort {$a cmp $b} readdir(XMLDIR);
but this isn't what I need.
I also tried
my @files = sort {$a <=> $b} readdir(XMLDIR);
which obviously fails because the filenames contain ".xml" and are not numeric as a whole.
Could someone open their heart and save me a week of browsing the Perl manuals?
Despite your claim, sort { $a <=> $b } readdir(XMLDIR)
works. When Perl treats the string 2384747.xml
as a number (as <=>
does), it is treated as having the value 2384747
.
$ perl -wE'say 0+"2384747.xml"'
Argument "2384747.xml" isn't numeric in addition (+) at -e line 1.
2384747
Of course, those warnings are a problem. The solution you accepted tries to remove them, but fails to remove all of them because it doesn't take into account that readdir
will return .
and ..
. You gotta remove the files you don't want first.
Here are two simple solutions:
my @files =
sort { no warnings 'numeric'; $a <=> $b }
grep { /^(\d)\.xml/ }
readdir(XMLDIR);
my @files =
sort { ( $a =~ /(\d+)/ )[0] <=> ( $b =~ /(\d+)/ )[0] }
grep { /^(\d)\.xml/ }
readdir(XMLDIR);
In this particular case, you can optimize your code:
my @files =
map { "$_.xml" } # Recreate the file name.
sort { $a <=> $b } # Compare the numbers.
map { /^(\d)\.xml/ } # Extract the number from desired files.
readdir(XMLDIR);
The simplest and fastest solution, however is to use a natural sort.
use Sort::Key::Natural qw( natsort );
my @files = natsort grep !/^\.\.?/, readdir(XMLDIR);
You are actually pretty close. Just strip off the ".xml" when inside your compare:
opendir(XMLDIR,$xmldirname);
my @files = sort {substr($a, 0, index($a, '.')) <=> substr($b, 0, index($b, '.'))} readdir(XMLDIR);
The problem is that <=>
cannot work on something that is not entirely a number, in fact if you use warnings;
you would get a message similar to this at run-time:
Argument "11139.xml" isn't numeric in sort at testsort.pl line 9.
What you can do is separate out the filename from the extension, sort numerically on the filename then re-combine the extensions in. This can be done fairly straightforward with a Schwartzian transform
:
use strict;
use warnings;
use Data::Dumper;
# get all of the XML files
my @xml_files = glob("*.xml");
print 'Unsorted: ' . Dumper \@xml_files;
@xml_files = map { join '.', @$_ } # join filename and extension
sort { $a->[0] <=> $b->[0] } # sort against filename
map { [split /\./] } @xml_files; # split on '.'
print 'Sorted: ' . Dumper \@xml_files;
__END__
Unsorted: $VAR1 = [
'11139.xml',
'18136.xml',
'28715.xml',
'6810.xml',
'9698.xml'
];
Sorted: $VAR1 = [
'6810.xml',
'9698.xml',
'11139.xml',
'18136.xml',
'28715.xml'
];
my @files = sort {
my ($x) = split /\./, $a;
my ($y) = split /\./, $b;
$x <=> $y
} readdir(XMLDIR);
Or without the temporary variables:
my @files = sort {(split /\./, $a)[0] <=> (split /\./, $b)[0]} readdir(XMLDIR);