Is there a way to have a capture repeat an arbitra

2019-02-25 14:13发布

问题:

I'm using the C++ tr1::regex with the ECMA regex grammar. What I'm trying to do is parse a header and return values associated with each item in the header.

Header:

-Testing some text
-Numbers 1 2 5
-MoreStuff some more text
-Numbers 1 10

What I would like to do is find all of the "-Numbers" lines and put each number into its own result with a single regex. As you can see, the "-Numbers" lines can have an arbitrary number of values on the line. Currently, I'm just searching for "-Numbers([\s0-9]+)" and then tokenizing that result. I was just wondering if there was any way to both find and tokenize the results in a single regex.

回答1:

No, there is not.



回答2:

I was about to ask this exact same question, and I kind of found a solution.

Let's say you have an arbitrary number of words you want to capture.

"there are four lights"

and

"captain picard is the bomb"

You might think that the solution is:

/((\w+)\s?)+/

But this will only match the whole input string and the last captured group.

What you can do is use the "g" switch.

So, an example in Perl:

use strict;
use warnings;

my $str1 = "there are four lights";
my $str2 = "captain picard is the bomb";

foreach ( $str1, $str2 ) {
    my @a = ( $_ =~ /(\w+)\s?/g );
    print "captured groups are: " . join( "|", @a ) . "\n";
}

Output is:

captured groups are: there|are|four|lights
captured groups are: captain|picard|is|the|bomb

So, there is a solution if your language of choice supports an equivalent of "g" (and I guess most do...).

Hope this helps someone who was in the same position as me!

S



回答3:

Problem is that desired solution insists on use of capture groups. C++ provides tool regex_token_iterator to handle this in better way (C++11 example):

#include <iostream>
#include <string>
#include <regex>

using namespace std;

int main() {
    std::regex e (R"((?:^-Numbers)?\s*(\d+))");

    string input;

    while (getline(cin, input)) {
        std::regex_token_iterator<std::string::iterator> a{
            input.begin(), input.end(),
            e, 1,
            regex_constants::match_continuous
        };

        std::regex_token_iterator<std::string::iterator> end;
        while (a != end) {
            cout << *a << " - ";
            ++a;
        }
        cout << '\n';
    }

    return 0;
}

https://wandbox.org/permlink/TzVEqykXP1eYdo1c