的Perl：如何提取括号之间的串(Perl: How to extract a string bet

2019-09-28 19:33发布

站内文章 / 移动开发

35 0

疯言疯语

女 | 书童

私信

我在MoinMoin的文本格式的文件：

* [[  Virtualbox Guest Additions]] (2011/10/17 15:19)
* [[  Abiword Wordprocessor]] (2010/10/27 20:17)
* [[  Sylpheed E-Mail]] (2010/03/30 21:49)
* [[   Kupfer]] (2010/05/16 20:18)

所有的“[[”和“]]”之间的词语是条目的简短描述。我需要提取整项，但不是每一个单词。

我发现了一个类似的问题的答案在这里： https://stackoverflow.com/a/2700749/819596但无法理解的答案： "my @array = $str =~ /( \{ (?: [^{}]* | (?0) )* \} )/xg;"

凡是作品将被接受，但解释将有很大的帮助，即：什么(?0)或/xg一样。

Answer 1:

该代码可能会是这样的：

use warnings; 
use strict;

my @subjects; # declaring a lexical variable to store all the subjects
my $pattern = qr/ 
  \[ \[    # matching two `[` signs
  \s*      # ... and, if any, whitespace after them
  ([^]]+) # starting from the first non-whitespace symbol, capture all the non-']' symbols
  ]]
/x;

# main processing loop:
while (<DATA>) { # reading the source file line by line
  if (/$pattern/) {      # if line is matched by our pattern
    push @subjects, $1;  # ... push the captured group of symbols into our array
  }
}
print $_, "\n" for @subjects; # print our array of subject line by line

__DATA__
* [[  Virtualbox Guest Additions]] (2011/10/17 15:19)
* [[  Abiword Wordprocessor]] (2010/10/27 20:17)
* [[  Sylpheed E-Mail]] (2010/03/30 21:49)
* [[   Kupfer]] (2010/05/16 20:18)

依我之见，你需要什么可以描述如下：在文件中的每一行试图找到符号的这个序列...

[[, an opening delimiter, 
then 0 or more whitespace symbols,
then all the symbols that make a subject (which should be saved),
then ]], a closing delimiter

正如你看到的，这个描述很自然地转化为正则表达式。这恐怕不是唯一需要的就是/x正则表达式修饰符，这让我评论广泛它。）

Answer 2:

如果文本永远不会包含] ，你可以简单地使用如以前提出以下建议：

/\[\[ ( [^\]]* ) \]\]/x

下面允许]中包含的文本，但我建议不要将它纳入一个更大的格局：

/\[\[ ( .*? ) \]\]/x

下面允许]中包含的文本，而且是最强大的解决方案：

/\[\[ ( (?:(?!\]\]).)* ) \]\]/x

例如，

if (my ($match) = $line =~ /\[\[ ( (?:(?!\]\]).)* ) \]\]/x) {
   print "$match\n";
}

要么

my @matches = $file =~ /\[\[ ( (?:(?!\]\]).)* ) \]\]/xg;

/x ：在图案忽略空格。要添加允许空间以使图案可读而不改变图案的含义。记录在perlre 。
/g ：查找所有匹配。记录在perlop得到。
(?0)是用来做模式递归，因为链接的节点不得不处理花括号的任意嵌套。 * /g ：找到所有的比赛。记录在perlre 。

Answer 3:

\[\[(.*)]]

\[是字面[， ]是文字]， .*表示0个或多个字符每个序列，东西用括号括起来是捕获组，因此，你可以用$ 1（或2 $ .. $ 9日后访问在你的脚本这取决于你有多少组有）。

把所有在一起，你会两个赛[然后一切都交给两个连续的最后一次出现]

更新我突然很困惑你的问题的二读，你需要[和]，或整条线路之间的内容-在这种情况下离开括号进行彻底和公正的测试，如果该模式匹配，没有必要捕捉。

Answer 4:

你找到了答案是递归模式匹配，我认为你不需要。

/ X允许使用无意义的空格和注释的正则表达式。
/克贯穿所有的字符串正则表达式。没有它只能运行，直到第一场比赛。
/ XG是/ x和/克组合。
（？0）再次运行正则表达式本身（递归）

如果我没有理解好了，你需要的东西是这样的：

$text="* [[  Virtualbox Guest Additions]] (2011/10/17 15:19)
* [[  Abiword Wordprocessor]] (2010/10/27 20:17)
* [[  Sylpheed E-Mail]] (2010/03/30 21:49)
* [[   Kupfer]] (2010/05/16 20:18)
";

@array=($text=~/\[\[([^\]]*)\]\]/g);
print join(",",@array);

# this prints "  Virtualbox Guest Additions,  Abiword Wordprocessor,  Sylpheed E-Mail,   Kupfer"

Answer 5:

我会建议使用“extract_bracketed”或“extract_delimited”从模块文本::平衡-在这里看到： http://perldoc.perl.org/Text/Balanced.html

Answer 6:

perl -pe 's/.*\[\[(.*)\]\].*/\1/g' temp

下面的测试：

> cat temp
        * [[  Virtualbox Guest Additions]] (2011/10/17 15:19)
        * [[  Abiword Wordprocessor]] (2010/10/27 20:17)
        * [[  Sylpheed E-Mail]] (2010/03/30 21:49)
        * [[   Kupfer]] (2010/05/16 20:18)
>
> perl -pe 's/.*\[\[(.*)\]\].*/\1/g' temp
  Virtualbox Guest Additions
  Abiword Wordprocessor
  Sylpheed E-Mail
   Kupfer
>

S /。 [[（。）]。* / \ 1 /克
* [ - >匹配任何系统字符，直到[
（。*）]]存储任何系统字符后的字符串 “[[”，直到 “]]” 在\ 1
* - >行的其余部分相匹配。

然后，因为我们在\ 1我们的数据，我们可以简单地使用它的控制台上进行打印。

Answer 7:

my @array = $str =~ /( \{ (?: [^{}]* | (?0) )* \} )/xg;

的“x”标记是指空白在正则表达式忽略，以允许一个更可读的表达。在“G”标志意味着其结果将是所有匹配列表由左到右（比赛* G * lobally）。

的(?0)表示第一组括号内的正则表达式。这是一个递归的正则表达式，相当于一组规则，例如：

E := '{' ( NoBrace | E) '}'
NoBrace := [^{}]*

文章来源: Perl: How to extract a string between brackets

标签： perl matching

疯言疯语

女 | 书童

私信

收藏的人(0)

Ta的文章更多文章

0条评论

还没有人评论过~