When are Schwartzian Transforms useful?

2019-06-16 06:52发布

While going through the "Intermediate Perl" book I noticed a section on Schwartzian Transforms and tried the example in the exercise (9.9.2) but noticed that multiple runs resulted in the transform taking more time then the normal sort. The code here performs a simple sort of the files in the windows\system32 directory based on file size -

#!/usr/bin/perl
use strict;
use warnings;

use Benchmark;

my $time = timethese( 10, {
            testA => sub { map $_->[0],     
                        sort {$a->[1] <=> $b->[1]}
                        map [$_, -s $_],
                        glob "C:\\Windows\\System32\\*";
                    },
            testB => sub { sort { -s $a <=> -s $b } glob "C:\\Windows\\System32\\*";
                    },
            }
        );

The output is -

Benchmark: timing 10 iterations of testA, testB...
     testA: 11 wallclock secs ( 1.89 usr +  8.38 sys = 10.27 CPU) @  0.97/s (n=10)
     testB:  5 wallclock secs ( 0.89 usr +  4.11 sys =  5.00 CPU) @  2.00/s (n=10)

My understanding was that since the file operation (-s) needs to be repeated over and over in the testB case it should run a lot slower than testA. The output though deviates from that observation. What am I missing here?

4条回答
孤傲高冷的网名
2楼-- · 2019-06-16 07:29

Apart from the excellent answer by Manni, another factor to consider is that some caching might be going on in your example, e.g. on the file system level.

If the same file is accessed multiple times, the FS might do some caching resulting in less drastic time savings of the Schwartzian Transform than expected.

查看更多
虎瘦雄心在
3楼-- · 2019-06-16 07:33

I know this is technically already answered rather completely, but I had a couple relevant side notes.

First, I usually prefer cmpthese() to timethese() because it tells you which is better in a human readable and informative way instead of just presenting times.

Second, for theoretical problems like this, I usually try to avoid syscalls entirely where possible, since the kernel can make you wait forever if it's in the mood to do so -- not really a fair test.

Thrid, It's interesting to note that the transform is always more expensive if the list is already sorted: If you set $point_of_interest=2, the transform wins; but if you set $point_of_interest=1, the regular sort will win. I find that result quite interesting and worth mentioning.

use strict;
use Benchmark qw(cmpthese);

my $point_of_interest = 2;
my @f = (1 .. 10_000) x $point_of_interest;
my @b;

cmpthese( 500, {
    transform => sub {
        @b = 
        map  {$_->[0]}
        sort {$a->[1] <=> $b->[1]}
        map  {[$_, expensive($_)]}
        @f
    },  
    vanilla => sub { @b = sort { expensive($a) <=> expensive($b) } @f },
}); 

sub expensive {
    my $arg = shift;

    $arg .= "_3272";
    $arg += length "$arg";
    $arg = $arg ** 17;
    $arg = sqrt($arg);

    $arg;
}   
查看更多
劫难
4楼-- · 2019-06-16 07:41

For a thorough examination of this case, see my Perlmonks post "Wasting Time Thinking about Wasted Time". I also expand on that in the "Benchmarking" chapter in Mastering Perl. As others have already noted, it's the benchmarking code that's the problem, not the transform. It's a mistake in Intermediate Perl.

However, to answer your question, a cached-key transform is useful when the sort-key computation is expensive and you'd have to compute it many times. There's a trade off between the extra work to cache the sort key and the cycles you save doing it. Typically, the more elements you have to sort the better the cached-key transform will perform.

查看更多
叛逆
5楼-- · 2019-06-16 07:43

For me, the output looks a bit different:

 testA:  1 wallclock secs ( 0.16 usr +  0.11 sys =  0.27 CPU) @ 37.04/s (n=10)
        (warning: too few iterations for a reliable count)
 testB:  0 wallclock secs ( 0.09 usr +  0.02 sys =  0.11 CPU) @ 90.91/s (n=10)
        (warning: too few iterations for a reliable count)

Benchmarking this with a decent value of iterations (I chose 100,000), I get this:

 testA: 23 wallclock secs (12.15 usr + 10.05 sys = 22.20 CPU) @ 4504.50/s (n=100000)
 testB: 11 wallclock secs ( 6.02 usr +  5.57 sys = 11.59 CPU) @ 8628.13/s (n=100000)

A look at the code tells me that those two subs probably spend most of their time globbing the files, so I did this:

my @files = glob "C:\\Windows\\System32\\*";
my $time = timethese( 1_000_000, {
                testA => sub {
                                map $_->[0],
                                    sort {$a->[1] <=> $b->[1]}
                                        map [$_, -s $_],
                                             @files;
                         },
                testB => sub {
                            sort { -s $a <=> -s $b } @files;
                         },
                }
        );

And get:

 testA: 103 wallclock secs (56.93 usr + 45.61 sys = 102.54 CPU) @ 9752.29/s (n=1000000)
 testB: -1 wallclock secs ( 0.12 usr +  0.00 sys =  0.12 CPU) @ 8333333.33/s (n=1000000)
        (warning: too few iterations for a reliable count)

Something smells fishy here, doesn't it?

So, let's take a look at the docs:

perldoc -f sort

In scalar context, the behaviour of "sort()" is undefined.

Aha! So let's try it again:

my @files = glob "C:\\Windows\\System32\\*";
my $time = timethese( 100_000, {
                testA => sub {
                              my @arr=  map $_->[0],
                                    sort {$a->[1] <=> $b->[1]}
                                        map [$_, -s $_],
                                             @files;
                         },
                testB => sub {
                            my @arr = sort { -s $a <=> -s $b } @files;
                         },
                }
        );

This gives me:

 testA: 12 wallclock secs ( 7.44 usr +  4.55 sys = 11.99 CPU) @ 8340.28/s (n=100000)
 testB: 34 wallclock secs ( 6.44 usr + 28.30 sys = 34.74 CPU) @ 2878.53/s (n=100000)

So. To answer your questions: A Schwartzian transform will help you whenever you use it in a meaningful way. Benchmarking will show this when you benchmark in a meaningful way.

查看更多
登录 后发表回答