Why does this Perl variable keep its value

2019-05-26 03:15发布

问题:

What is the difference between the following two Perl variable declarations?

my $foo = 'bar' if 0;

my $baz;
$baz = 'qux' if 0;

The difference is significant when these appear at the top of a loop. For example:

use warnings;
use strict;

foreach my $n (0,1){
    my $foo = 'bar' if 0;
    print defined $foo ? "defined\n" : "undefined\n";
    $foo = 'bar';
    print defined $foo ? "defined\n" : "undefined\n";
}

print "==\n";

foreach my $m (0,1){
    my $baz;
    $baz = 'qux' if 0;
    print defined $baz ? "defined\n" : "undefined\n";
    $baz = 'qux';
    print defined $baz ? "defined\n" : "undefined\n";
}

results in

undefined
defined
defined
defined
==
undefined
defined
undefined
defined

It seems that if 0 fails, so foo is never reinitialized to undef. In this case, how does it get declared in the first place?

回答1:

First, note that my $foo = 'bar' if 0; is documented to be undefined behaviour, meaning it's allowed to do anything including crash. But I'll explain what happens anyway.


my $x has three documented effects:

  • It declares a symbol at compile-time.
  • It creates an new variable on execution.
  • It returns the new variable on execution.

In short, it's suppose to be like Java's Scalar x = new Scalar();, except it returns the variable if used in an expression.

But if it actually worked that way, the following would create 100 variables:

for (1..100) {
   my $x = rand();
   print "$x\n";
}

This would mean two or three memory allocations per loop iteration for the my alone! A very expensive prospect. Instead, Perl only creates one variable and clears it at the end of the scope. So in reality, my $x actually does the following:

  • It declares a symbol at compile-time.
  • It creates the variable at compile-time[1].
  • It puts a directive on the stack that will clear[2] the variable when the scope is exited.
  • It returns the new variable on execution.

As such, only one variable is ever created[2]. This is much more CPU-efficient than then creating one every time the scope is entered.

Now consider what happens if you execute a my conditionally, or never at all. By doing so, you are preventing it from placing the directive to clear the variable on the stack, so the variable never loses its value. Obviously, that's not meant to happen, so that's why my ... if ...; isn't allowed.


Some take advantage of the implementation as follows:

sub foo {
   my $state if 0;
   $state = 5 if !defined($state);
   print "$state\n";
   ++$state;
}

foo();  # 5
foo();  # 6
foo();  # 7

But doing so requires ignoring the documentation forbidding it. The above can be achieved safely using

{
   my $state = 5;
   sub foo {
      print "$state\n";
      ++$state;
   }
}

or

use feature qw( state );  # Or: use 5.010;

sub foo {
   state $state = 5;
   print "$state\n";
   ++$state;
}

Notes:

  1. "Variable" can mean a couple of things. I'm not sure which definition is accurate here, but it doesn't matter.

  2. If anything but the sub itself holds a reference to the variable (REFCNT>1) or if variable contains an object, the directive replaces the variable with a new one (on scope exit) instead of clearing the existing one. This allows the following to work as it should:

    my @a;
    for (...) {
        my $x = ...;
        push @a, \$x;
    }
    


回答2:

See ikegami's better answer, probably above.

In the first example, you never define $foo inside the loop because of the conditional, so when you use it, you're referencing and then assigning a value to an implicitly declared global variable. Then, the second time through the loop that outside variable is already defined.

In the second example, $baz is defined inside the block each time the block is executed. So the second time through the loop it is a new, not yet defined, local variable.