my question is quite simple in fact. I'm currently working on a language parser that can parse a meta language with embedded DSLs. This is quite interesting for me because it may parse websites with HTML and embedded JavaScript / CSS. I wanted to design some similar system with minimal DSLs for a specific use case.
Is boost::spirit capable of doing something similar? I just don't know how boost::spirit handles lexer generation or if it even is a scannerless parser.
Thanks in advance!
Spirit Qi can be used with a scanner (Spirit Lex) or without.
In my humble opinion, Spirit shines when using it scanner-less, though. The reason is mainly that Spirit shines when you avoid complexity, and using Spirit Lex acts like a complexity multiplier for your Spirit Qi grammar definition.
That out of the way,
- yes you can switch to different embedded grammars¹. The Nabialek trick is actually a famous way to achieve such a switch.
- technically it's also possible to switch lexer states to achieve the same switch when using Spirit Lex, but you have to bear in mind limitations of this method (Lexer State can not be manipulated depending on conditions in the Parser tier, contrary perhaps to things suggested by the presence of undocumented parser directives in this area)
- Your question doesn't seem to talk about ad-hoc/on-the-fly grammars, but since "DSLs" suggest this, I'll add proper warning: Spirit Qi is a parser generator framework that generates PEG parsers at compile time. In it's current incarnation, it does not lend itself well to generating rules/grammars at runtime (mainly due to limitations in Boost Proto/Boost Phoenix that underly it). Spirit X3 may lift many of these limitations, but that's future.
That said, I strongly suggest looking at ready made parsers/tokenizers for the purpose. My stance is usually summarized as: use Spirit for rapid development and ad-hoc parsing.
As soon as your grammar becomes complex enough and you know the grammar is fixed/stable, I believe you can achieve best results with a handwritten parser or using one of the more tedious parser generators like ANTLR, CoCo/R, Flex/bison etc, which require more setup cost.
¹ Side note: I don't think "DSLs" is an appropriate term for the case of scripts inside HTML. The "embedded" nature is only tangentially related, and e.g. ECMAScript is hardly "Domain Specific", so I'll stick to "Embedded Grammar" here