I was playing around with Boost.Regex to parse strings for words and numbers. This is what I have so far:
#include <iostream>
#include <string>
#include <boost/foreach.hpp>
#include <boost/regex.hpp>
#include <boost/range.hpp>
using namespace std;
using namespace boost;
int main()
{
regex re
(
"("
"([a-z]+)|"
"(-?[0-9]+(\\.[0-9]+)?)"
")"
);
string s = "here is a\t list of Words. and some 1239.32 numbers to 3323 parse.";
sregex_iterator m1(s.begin(), s.end(), re), m2;
BOOST_FOREACH (const match_results<string::const_iterator>& what, make_iterator_range(m1, m2)) {
cout << ":" << what[1].str() << ":" << what.position(1) << ":" << what.length(1) << endl;
}
return 0;
}
Is there a way to tell regex to parse from a stream rather than a string? It seems like it should be possible to use any iterator.
Boost.IOStreams has a regex_filter allowing one to perform the equivalent of a regex_replace on a stream. However, looking at the implementation, it seems to "cheat" in that it simply loads the whole stream into a buffer and then calls Boost.Regex on that buffer.
Making a regex search on a stream's contents without having to entirely load it in memory can be done with the "partial match" support of Boost.Regex. Look at the example at the end of the page.
The regex_iterator constructor requires BidirectionalIterators, but std::istream_iterator is only an InputIterator, so it appears that you'd not be able to do this with any of the standard stream classes and/or objects (cin, ifstream, etc.). If you had a custom stream that exposed a bidirectional iterator, it should work.
The finite state machine needs to be able to "back up" in case what it's trying right now fails. This is impossible for input iterators, which cannot "back up".