Sorting file names by numeric value

2019-07-21 08:49发布

Preamble: I hate to ask questions like this, but I'm stuck with it and just learning Perl... seems like an easy task but I don't know where to look.

I have a folder with lots of xml-files that are all named ".xml". I need to process those files in their numeric order, so "9123.xml" should come before "2384747.xml". I have successfully sorted the list alphabetically with this:

opendir(XMLDIR,$xmldirname);
my @files = sort {$a cmp $b} readdir(XMLDIR);

but this isn't what I need.

I also tried

my @files = sort {$a <=> $b} readdir(XMLDIR);

which obviously fails because the filenames contain ".xml" and are not numeric as a whole.

Could someone open their heart and save me a week of browsing the Perl manuals?

4条回答
Evening l夕情丶
2楼-- · 2019-07-21 08:52

The problem is that <=> cannot work on something that is not entirely a number, in fact if you use warnings; you would get a message similar to this at run-time:

Argument "11139.xml" isn't numeric in sort at testsort.pl line 9.

What you can do is separate out the filename from the extension, sort numerically on the filename then re-combine the extensions in. This can be done fairly straightforward with a Schwartzian transform:

use strict;
use warnings; 

use Data::Dumper; 

# get all of the XML files
my @xml_files = glob("*.xml");

print 'Unsorted: ' . Dumper \@xml_files; 
@xml_files = map  { join '.', @$_ }              # join filename and extension
             sort { $a->[0] <=> $b->[0] }        # sort against filename
             map  { [split /\./] } @xml_files;   # split on '.'
print 'Sorted: ' . Dumper \@xml_files; 

__END__
Unsorted: $VAR1 = [
          '11139.xml',
          '18136.xml',
          '28715.xml',
          '6810.xml',
          '9698.xml'
        ];
Sorted: $VAR1 = [
          '6810.xml',
          '9698.xml',
          '11139.xml',
          '18136.xml',
          '28715.xml'
        ];
查看更多
兄弟一词,经得起流年.
3楼-- · 2019-07-21 08:57
my @files =  sort {
    my ($x) = split /\./, $a;
    my ($y) = split /\./, $b;
    $x <=> $y
} readdir(XMLDIR);

Or without the temporary variables:

my @files =  sort {(split /\./, $a)[0] <=> (split /\./, $b)[0]} readdir(XMLDIR);
查看更多
聊天终结者
4楼-- · 2019-07-21 08:59

You are actually pretty close. Just strip off the ".xml" when inside your compare:

opendir(XMLDIR,$xmldirname);
my @files = sort {substr($a, 0, index($a, '.')) <=> substr($b, 0, index($b, '.'))} readdir(XMLDIR);
查看更多
爷的心禁止访问
5楼-- · 2019-07-21 09:07

Despite your claim, sort { $a <=> $b } readdir(XMLDIR) works. When Perl treats the string 2384747.xml as a number (as <=> does), it is treated as having the value 2384747.

$ perl -wE'say 0+"2384747.xml"'
Argument "2384747.xml" isn't numeric in addition (+) at -e line 1.
2384747

Of course, those warnings are a problem. The solution you accepted tries to remove them, but fails to remove all of them because it doesn't take into account that readdir will return . and ... You gotta remove the files you don't want first.

Here are two simple solutions:

my @files =
   sort { no warnings 'numeric'; $a <=> $b }
      grep { /^(\d)\.xml/ }
         readdir(XMLDIR);

my @files =
   sort { ( $a =~ /(\d+)/ )[0] <=> ( $b =~ /(\d+)/ )[0] }
      grep { /^(\d)\.xml/ }
         readdir(XMLDIR);

In this particular case, you can optimize your code:

my @files =
   map { "$_.xml" }             # Recreate the file name.
      sort { $a <=> $b }        # Compare the numbers.
         map { /^(\d)\.xml/ }   # Extract the number from desired files.
            readdir(XMLDIR);

The simplest and fastest solution, however is to use a natural sort.

use Sort::Key::Natural qw( natsort );

my @files = natsort grep !/^\.\.?/, readdir(XMLDIR);
查看更多
登录 后发表回答