Can you use Boost.Regex to parse a stream?

2019-01-23 17:00发布

问题:

I was playing around with Boost.Regex to parse strings for words and numbers. This is what I have so far:

#include <iostream>
#include <string>
#include <boost/foreach.hpp>
#include <boost/regex.hpp>
#include <boost/range.hpp>

using namespace std;
using namespace boost;

int main()
{
    regex re
    (
        "("
            "([a-z]+)|"
            "(-?[0-9]+(\\.[0-9]+)?)"
        ")"
    );

    string s = "here is a\t list of Words. and some 1239.32 numbers to 3323 parse.";
    sregex_iterator m1(s.begin(), s.end(), re), m2;

    BOOST_FOREACH (const match_results<string::const_iterator>& what, make_iterator_range(m1, m2)) {
        cout << ":" << what[1].str() << ":" << what.position(1) << ":" << what.length(1) << endl;
    }

    return 0;
}

Is there a way to tell regex to parse from a stream rather than a string? It seems like it should be possible to use any iterator.

回答1:

Boost.IOStreams has a regex_filter allowing one to perform the equivalent of a regex_replace on a stream. However, looking at the implementation, it seems to "cheat" in that it simply loads the whole stream into a buffer and then calls Boost.Regex on that buffer.

Making a regex search on a stream's contents without having to entirely load it in memory can be done with the "partial match" support of Boost.Regex. Look at the example at the end of the page.



回答2:

The regex_iterator constructor requires BidirectionalIterators, but std::istream_iterator is only an InputIterator, so it appears that you'd not be able to do this with any of the standard stream classes and/or objects (cin, ifstream, etc.). If you had a custom stream that exposed a bidirectional iterator, it should work.



回答3:

The finite state machine needs to be able to "back up" in case what it's trying right now fails. This is impossible for input iterators, which cannot "back up".