可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have been using the Perl command line with a -ne
option for years, largely to process text files in ways that sed can't. Example:
cat in.txt | perl -ne "s/abc/def/; s/fgh/hij/; print;" > out.txt
I have no idea where I learned this, and have only today read perlrun and found there are other forms (perl -pe
for example).
What else should I know about perl -ne
?
回答1:
perl -ne 'CODE'
is equivalent to the program
while (<>) {
CODE
}
perl -ane 'CODE'
and perl -F/PATTERN/ -ane
are also good idioms to know about. They are equivalent to
while (<>) {
@F = split /\s+/, $_;
CODE
}
and
while (<>) {
@F = split /PATTERN/, $_;
CODE
}
Example: advanced grep:
perl -ne 'print if/REGEX1/&&!/REGEX2/&&(/REGEX3/||/REGEX4/&&!/REGEX5/)' input
perl -F/,/ -ane 'print if $F[2]==4&&$F[3]ge"2009-07-01"&&$F[3]lt"2009-08-01"' file.csv
A particularly clever example that uses mismatched braces is here.
回答2:
There is one important thing to know about perl -ne
and perl -pe
scripts: they implicitly use <>
.
"Why is that important?" you might ask.
The magic <>
operator uses the 2 arg form of open. If you recall, 2 arg open includes the specification of mode with the filename in one argument. An old style call to open FILE, $foo
is vulnerable to manipulation of the file mode. A particularly interesting mode in this context is |
--you open a handle to a pipe to a process you execute.
You might be thinking "Big deal!", but it is.
- Imagine a cron job executed by root to munge log files in some directory.
- The script is invoked as
script *
.
- Imagine a file in that directory named
|rm -rf /
.
What happens?
- The shell expands the
*
and we get script file_1 file_2 '|rm -rf /' file_4
- The script processes
file_1
and file_2
.
- Next it opens a handle to STDIN of
rm -rf /
.
- Lots of disk activity follows.
file_4
no longer exists, so we can't open it.
Of course, the possibilities are endless.
You can read more discussion of this issue at Perlmonks.
The moral of the story: be careful with the <>
operator.
FWIW, I just confirmed that this is still an issue with perl 5.10.0.
回答3:
You can specify more than one -e clause. Sometimes I have a command line that starts growing as I refine a search / extract / mangulation operation. if you mistype something, you will get a "line number" telling you which -e has the error.
Of course, some might argue that if you have more than one or two -e clauses, maybe you should put whatever it is into a script, but some stuff really is just throw away, so why bother.
perl -n -e 'if (/good/)' -e '{ system "echo $_ >> good.txt"; }' \
-e 'elsif (/bad/)' -e '{ system "echo $_ >> bad.txt"; }' \
-e 'else' -e '{ system "echo $_ >> ugly.txt"; }' in.txt another.txt etc.txt
Presumably you would do something less trivial than grep / egrep into 3 files :-)
回答4:
The -i
option lets you do the changes inline:
perl -i -pe 's/abc/def/; s/fgh/hij/' file.txt
or save a backup:
perl -i.bak -pe 's/abc/def/; s/fgh/hij/' file.txt
回答5:
I like to think of perl -n
as picking out specific bits of the input and perl -p
as map
for all lines of the input.
As you've observed, it's possible to get the effect of -p
with -n
, and we can emulate the other way around:
$ echo -e "1\n2\n3" | perl -pe '$_="" if $_ % 2 == 0'
1
3
Skipping lines with next
would seem more natural, but -p
wraps code in
LINE:
while (<>) {
... # your program goes here
} continue {
print or die "-p destination: $!\n";
}
By design, next
runs continue
blocks:
If there is a continue
BLOCK, it is always executed just before the conditional is about to be evaluated again. Thus it can be used to increment a loop variable, even when the loop has been continued via the next
statement.
The -l
switch has two handy effects:
- With
-n
and -p
, automatically chomp
each input record.
- Set
$\
so every print
implicitly adds a terminator.
For example, to grab the first 10 UDP ports mentioned in /etc/services
you might
perl -ane 'print $F[1] if $F[1] =~ /udp/' /etc/services | head
but oops:
7/udp9/udp11/udp13/udp17/udp19/udp37/udp39/udp42/ud...
Better:
$ perl -lane 'print $F[1] if $F[1] =~ /udp/' /etc/services | head
7/udp
9/udp
11/udp
13/udp
17/udp
19/udp
37/udp
39/udp
42/udp
53/udp
Remember that -n
and -p
can be in the shebang line too, so to save the above oneliner as a script:
#! /usr/bin/perl -lan
BEGIN {
@ARGV = ("/etc/services") unless @ARGV;
open STDOUT, "|-", "head" or die "$0: head failed";
}
print $F[1] if $F[1] =~ /udp/
回答6:
My favorite reference for Perl one liners (and the top hit on Google for that phrase) covers perl -ne
: http://novosial.org/perl/one-liner/
回答7:
I often use sed
or awk
but I really like this perl
matching pattern killer feature:
$ cat my-input.txt
git 111 HERE 2222 voila 333
any 444 HERE none start 555 HERE 6
svn 777 aaaa 8888 nothing
two 222 HERE 9999 HERE 0000
$ perl -nle 'print $a if (($a)=/HERE ([0-9]+)/)' my-input.txt
2222
6
9999