So I've been working with the Boost Spirit Compiler tutorial. Currently, it works great with integers. I am working on a way to extend it to handle strings. Here is the link to the source code.
http://www.boost.org/doc/libs/1_57_0/libs/spirit/example/qi/compiler_tutorial/mini_c/
For those familiar with Boost, the following should look familiar - it is a production rule for a primary expression:
primary_expr =
uint_
| function_call
| identifier
| bool_
| '(' > expr > ')'
;
uint_ is what allows a primary_expr to be assigned an int. Normally, we could add some simple functionality for a char or a string by either creating a few more production rules, or else a simple text parser using a regex that identifies quotes or something like that. There are tons of examples if you back up the root of the link I sent.
The real problem comes with the fact that to implement the compiler, the code pushes bytecode operations into a vector. It's trivial to push a single char here, since all chars have an accompanying ASCII code that it will be implicitly converted to, but not the case for an array of chars, since they would lose their context in the process as part of a larger string (that forms a sentence, eg).
The best option I can come up with is to change the
vector<int>
to
vector<uintptr_t>
From my understanding, this type of pointer can point to both integers and chars. Though, it's not simply a matter of changing the 'uint_' to 'uintptr_t' within the above production rule. The compiler tells me that it's an illegal use in this particular instance.
By the way, you will see the implementation of our vector holding the bytecode within the compiler.cpp/.hpp files.
Any help would be appreciated, and if you need any more information, please ask. Thanks.
Normally, we could add some simple functionality for a char or a string by either creating a few more production rules, or else a simple text parser using a regex that identifies quotes or something like that
Regex is not supported. You can use a subset of regular expression syntax in Boost Spirit Lex patterns (which can be used in token_def
) but that would complicate the picture considerably.
The real problem comes with the fact that to implement the compiler, the code pushes bytecode operations into a vector. It's trivial to push a single char here, since all chars have an accompanying ASCII code that it will be implicitly converted to, but not the case for an array of chars, since they would lose their context in the process as part of a larger string (that forms a sentence, eg).
In jargon: the AST doesn't accommodate non-integral values.
The simples way would be to extend the AST for an operand:
typedef boost::variant<
nil
, bool
, unsigned int
, identifier
, std::string // ADDED
, boost::recursive_wrapper<unary>
, boost::recursive_wrapper<function_call>
, boost::recursive_wrapper<expression>
>
operand;
(Note: this is also the type of attribute exposed by primary_expr
and unary_expr
)
Now lets extend the rules:
quoted_string = '"' >> *('\\' >> char_ | ~char_('"')) >> '"';
primary_expr =
uint_
| function_call
| identifier
| quoted_string
| bool_
| ('(' > expr > ')')
;
Note that we declared quoted_string
without a skipper so we don't have to do the lexeme[]
incantation (Boost spirit skipper issues).
Compiler support
Next, when compiling it turns out the compiler
visitor doesn't know the strings yet. So, we add
op_string, // push constant string into the stack
and
bool compiler::operator()(std::string const& x)
{
BOOST_ASSERT(current != 0);
current->op(op_string, x);
return true;
}
in the respective places.
(still https://www.livecoding.tv/sehe/ coding, pushed the answer so you can read it ahead of time)