I have tried to write a mustache parser with the excellent Boost.XPressive from the brilliant Eric Niebler. But since this is my first parser I am not familiar with the "normal" approach and lingo of compiler writers and feel a bit lost after a few days of trial&error. So I come here and hope someone can tell me the foolishness of my n00bish ways ;)
This is the HTML code with the mustache templates that I want to extract (http://mustache.github.io/):
Now <bold>is the {{#time}}gugus {{zeit}} oder nicht{{/time}} <i>for all good men</i> to come to the {007} aid of their</bold> {{country}}. Result: {{#Res1}}Nullum <b>est</b> mundi{{/Res1}}
I have the following problems that I couldn't yet solve alone:
- The parser I wrote doesn't print out anything but also doesn't issue a warning at compile-time. I managed before to have it print out parts of the mustache code but never all of it correctly.
- I don't know how I can loop through all the code to find all occurrences but then also access them like with the
smatch what;
variable. The doc only shows how to find the first occurrence with "what" or how to output all the occurrences with the "iterator".- Actually I need a combination of both. Because once something is found I need to question the tags name and the content between the tags (which "what" would offer but the "iterator" won't allow) - and act accordingly. I guess I could use "actions" but how?
- I think that it should be possible to do the tag finding and "content between tags" in one swoop, right? Or do I need to parser 2 times for that - and if so how?
- Is it okay to parse the opening and closing brackets like I did, since there are always 2 brackets? Or should I do it in sequence or use
repeat<2,2>('{')
? - I still feel a bit unsure about the cases where
keep()
andby_ref()
are necessary and when better not to use them. - I couldn't find the other options of the 4th parameter of the iterator
sregex_token_iterator cur( str.begin(), str.end(), html, -1 );
here -1 which outputs all except the matching tags. - Is my parser string correctly finding nested mustache tags?
#include <boost/xpressive/xpressive_static.hpp>
#include <boost/xpressive/match_results.hpp>
typedef std::string::const_iterator It;
using namespace boost::xpressive;
std::string str = "Now <bold>is the {{#time}}gugus {{zeit}} oder nicht{{/time}} <i>for all good men</i> to come to the {007} aid of their</bold> {{country}}. Result: {{#Res1}}Nullum <b>est</b> mundi{{/Res1}}";
// Parser setup --------------------------------------------------------
mark_tag mtag (1), cond_mtag (2), user_str (3);
sregex brackets = "{{"
>> keep ( mtag = repeat<1, 20> (_w) )
>> "}}"
;
sregex cond_brackets = "{{#"
>> keep (cond_mtag = repeat<1, 20> (_w) )
>> "}}"
>> * (
keep (user_str = + (*_s >> +alnum >> *_s) ) |
by_ref (brackets) |
by_ref (cond_brackets)
)
>> "{{/"
>> cond_mtag
>> "}}"
;
sregex mexpression = *( by_ref (cond_brackets) | by_ref (brackets) );
// Looping + catching the results --------------------------------------
smatch what2;
std::cout << "\nregex_search:\n" << str << '\n';
It strBegin = str.begin(), strEnd = str.end();
int ic = 0;
do
{
if ( !regex_search ( strBegin, strEnd, what2, mexpression ) )
{
std::cout << "\t>> Breakout of this life...! Exit after " << ic << " loop(s)." << std::endl;
break;
}
else
{
std::cout << "**Loop Nr: " << ic << '\n';
std::cout << "\twhat2[0] " << what2[0] << '\n'; // whole match
std::cout << "\twhat2[mtag] " << what2[mtag] << '\n';
std::cout << "\twhat2[cond_mtag] " << what2[cond_mtag] << '\n';
std::cout << "\twhat2[user_str] " << what2[user_str] << '\n';
// display the nested results
std::for_each (
what2.nested_results().begin(),
what2.nested_results().end(),
output_nested_results() // <--identical function from E.Nieblers documentation
);
strBegin = what2[0].second;
}
++ic;
}
while (ic < 6 || strBegin != str.end() );
What you need is a recursive descent parser. One is discussed here http://www.drdobbs.com/cpp/recursive-descent-peg-parsers-using-c-te/212700432
Here is the correct full code from @sehe that now works under GCC >4.8 and CLANG under Linux and Windows. Again many thanks mate for this awesome help, even though this means that I can bury XPressive :D
The following lines have changed or been added:
Boost Spirit is built on Proto (by the same hero, Eric Niebler!), so I hope you don't mind if I uphold a personal tradition of mine and present an implementation in Boost Spirit.
I found it pretty tricky to see what you wanted to achieve, from just the code shown. Therefore I just went straight to the
mustache
docs and implemented a parser for the following AST:As you can see, I've added support for negated sections as well as partial templates (i.e. variables that expand to a template to dynamically expand).
Here are the productions:
The only nifty thing is the use of a
qi::local<>
namedsection_id
to check that the closing tag of a section matches the opening tag of the current section.I optimize things based on the assumption that the input data will stay around, so we don't need to copy actual data. This should avoid 99% of allocation needs here. I used
boost::string_ref
to achieve this here, and I think it's fair to say that this introduces the only bits of complexity (see full code below).Now we're ready to take our parser for a spin See It Live On Coliru
Dumping::dumper
simply prints the mustache template back from the parsed AST. You might wonder howdumper
is implemented:Nothing overly complicated. Boost Variant really affords a declarative programming style. To illustrate this even more thoroughly, let's add expansion based on context objects!
I wasn't going to implement JSON just for this, so instead let's assume a context Value model like:
Now we use binary visitation against
mustache::melement
and this contextValue
variant. This is a bit more code than just dumping, but let's look at the use-site first:This prints (See it Live On Coliru again):
Full Code Listing
for reference: