I've got a bunch of strings like:
"Hello, here's a test colon:. Here's a test semi-colon;"
I would like to replace that with
"Hello, here's a test colon:. Here's a test semi-colon;"
And so on for all printable ASCII values.
At present I'm using boost::regex_search
to match &#(\d+);
, building up a string as I process each match in turn (including appending the substring containing no matches since the last match I found).
Can anyone think of a better way of doing it? I'm open to non-regex methods, but regex seemed a reasonably sensible approach in this case.
Thanks,
Dom
The existing SNOBOL solutions don't handle the multiple-patterns case properly, due to there only being one "&". The following solution ought to work better:
I don't know about the regex support in boost, but check if it has a replace() method that supports callbacks or lambdas or some such. That's the usual way to do this with regexes in other languages I'd say.
Here's a Python implementation:
Producing:
I've looked some at boost now and I see it has a regex_replace function. But C++ really confuses me so I can't figure out if you could use a callback for the replace part. But the string matched by the (\d\d) group should be available in $1 if I read the boost docs correctly. I'd check it out if I were using boost.
boost::spirit parser generator framework allows easily to create a parser that transforms desirable NCRs.
This will probably earn me some down votes, seeing as this is not a c++, boost or regex response, but here's a SNOBOL solution. This one works for ASCII. Am working on something for Unicode.
Here's a version based on
boost::regex_token_iterator
. The program replaces decimal NCRs read fromstdin
by corresponding ASCII characters and prints them tostdout
.Here's a NCR scanner created using Flex:
To make an executable:
Example:
It prints for non-recursive version:
And the recursive one produces: