Given a regular expression, I'm looking for a package which will dynamically generate the code for a finite state machine that implements the RE.
C/C++ and Python preferred, but other languages are of interest as well.
Given a regular expression, I'm looking for a package which will dynamically generate the code for a finite state machine that implements the RE.
C/C++ and Python preferred, but other languages are of interest as well.
re2c generates C code. I'm not sure what you mean by 'dynamically' -- AFAIK you'd have to compile and dynamic-load the output, if you want to call on the generated code during the same run that you generated it.
Ragel may be what your are looking for.
It generates C/C++/D/Ruby/Java code for state machines. These are described using both regular expressions and operators.
Check the website, its front page is quite explicit.
It might not be exactly what you're looking for, but the Xerox Finite State Transducer supports regular expressions, builds the machine, and even can create a graphical representation using GraphViz.
It's really nifty for things like morphology, but otherwise, I'd suggest it mainly if you're looking something to explore the theoretical side of finite state machines.
Two caveats: it uses its own syntax, so it's not necessarily going to translate over to your languages of choice easily, and I'm pretty sure you have to get a license for it. It comes with Karttunnen and Beesley's book "Finite State Morphology", which is a very interesting read in its own right.
The Finite State Automata Utilities supports generation of FSM's from regular expressions. It also supports C, C++ and Java code generation for FSM's. It supports dynamic generation, but it's written in Prolog, and calling from another language might be a hassle.
What you're asking for is a lexer... There are plenty of them for a plethora of programming languages. For a start you can have a look here.
A good Python implementation of a converter from Regular Expression to Finite State Machine is https://github.com/ferno/greenery. It is available on pypi via 'pip install greenery'.
Another python package uses greenery to implement iterative parsers: Communications Protocol Python Parser and Originator https://github.com/pjkundert/cpppo. It is also available via 'pip install cpppo'. Cpppo is unfortunately quite complex, in no small part due to an attempt to support both Python 2 and 3 in the same source, including full UTF-8 compatibility.
Anyway, cpppo should give you an idea of how to apply the excellent greenery Regex to FSM converter.