What should every Perl hacker know about perl -ne?

2019-03-08 11:07发布

站内文章 / 前沿技术

51 0

问题:

I have been using the Perl command line with a -ne option for years, largely to process text files in ways that sed can't. Example:

cat in.txt | perl -ne "s/abc/def/; s/fgh/hij/; print;" > out.txt

I have no idea where I learned this, and have only today read perlrun and found there are other forms (perl -pe for example).

What else should I know about perl -ne?

回答1:

perl -ne 'CODE' is equivalent to the program

while (<>) {
    CODE
}

perl -ane 'CODE' and perl -F/PATTERN/ -ane are also good idioms to know about. They are equivalent to

while (<>) {
    @F = split /\s+/, $_;
    CODE
}

and

while (<>) {
    @F = split /PATTERN/, $_;
    CODE
}

Example: advanced grep:

perl -ne 'print if/REGEX1/&&!/REGEX2/&&(/REGEX3/||/REGEX4/&&!/REGEX5/)' input

perl -F/,/ -ane 'print if $F[2]==4&&$F[3]ge"2009-07-01"&&$F[3]lt"2009-08-01"' file.csv

A particularly clever example that uses mismatched braces is here.

回答2:

There is one important thing to know about perl -ne and perl -pe scripts: they implicitly use <>.

"Why is that important?" you might ask.

The magic <> operator uses the 2 arg form of open. If you recall, 2 arg open includes the specification of mode with the filename in one argument. An old style call to open FILE, $foo is vulnerable to manipulation of the file mode. A particularly interesting mode in this context is |--you open a handle to a pipe to a process you execute.

You might be thinking "Big deal!", but it is.

Imagine a cron job executed by root to munge log files in some directory.
The script is invoked as script *.
Imagine a file in that directory named |rm -rf /.

What happens?

The shell expands the * and we get script file_1 file_2 '|rm -rf /' file_4
The script processes file_1 and file_2.
Next it opens a handle to STDIN of rm -rf /.
Lots of disk activity follows.
file_4 no longer exists, so we can't open it.

Of course, the possibilities are endless.

You can read more discussion of this issue at Perlmonks.

The moral of the story: be careful with the <> operator.

FWIW, I just confirmed that this is still an issue with perl 5.10.0.

回答3:

You can specify more than one -e clause. Sometimes I have a command line that starts growing as I refine a search / extract / mangulation operation. if you mistype something, you will get a "line number" telling you which -e has the error.

Of course, some might argue that if you have more than one or two -e clauses, maybe you should put whatever it is into a script, but some stuff really is just throw away, so why bother.

perl -n -e 'if (/good/)' -e '{ system "echo $_ >> good.txt"; }' \
-e 'elsif (/bad/)' -e '{ system "echo $_ >> bad.txt"; }' \
-e 'else' -e '{ system "echo $_ >> ugly.txt"; }' in.txt another.txt etc.txt

Presumably you would do something less trivial than grep / egrep into 3 files :-)

回答4:

The -i option lets you do the changes inline:

 perl -i -pe 's/abc/def/; s/fgh/hij/' file.txt

or save a backup:

 perl -i.bak -pe 's/abc/def/; s/fgh/hij/' file.txt

回答5:

I like to think of perl -n as picking out specific bits of the input and perl -p as map for all lines of the input.

As you've observed, it's possible to get the effect of -p with -n, and we can emulate the other way around:

$ echo -e "1\n2\n3" | perl -pe '$_="" if $_ % 2 == 0'
1
3

Skipping lines with next would seem more natural, but -p wraps code in

LINE:
while (<>) {
    ...     # your program goes here
} continue {
    print or die "-p destination: $!\n";
}

By design, next runs continue blocks:

If there is a continue BLOCK, it is always executed just before the conditional is about to be evaluated again. Thus it can be used to increment a loop variable, even when the loop has been continued via the next statement.

The -l switch has two handy effects:

With -n and -p, automatically chomp each input record.
Set $\ so every print implicitly adds a terminator.

For example, to grab the first 10 UDP ports mentioned in /etc/services you might

perl -ane 'print $F[1] if $F[1] =~ /udp/' /etc/services | head

but oops:

7/udp9/udp11/udp13/udp17/udp19/udp37/udp39/udp42/ud...

Better:

$ perl -lane 'print $F[1] if $F[1] =~ /udp/' /etc/services | head
7/udp
9/udp
11/udp
13/udp
17/udp
19/udp
37/udp
39/udp
42/udp
53/udp

Remember that -n and -p can be in the shebang line too, so to save the above oneliner as a script:

#! /usr/bin/perl -lan

BEGIN {
  @ARGV = ("/etc/services") unless @ARGV;
  open STDOUT, "|-", "head" or die "$0: head failed";
}

print $F[1] if $F[1] =~ /udp/

回答6:

My favorite reference for Perl one liners (and the top hit on Google for that phrase) covers perl -ne: http://novosial.org/perl/one-liner/

回答7:

I often use sed or awk but I really like this perl matching pattern killer feature:

$ cat my-input.txt
git 111 HERE 2222 voila 333
any 444 HERE none start 555 HERE 6
svn 777 aaaa 8888 nothing
two 222 HERE 9999 HERE 0000

$ perl -nle 'print $a if (($a)=/HERE ([0-9]+)/)' my-input.txt
2222
6
9999