i just discovered http://code.google.com/p/re2, a promising library that uses a long-neglected way (Thompson NFA) to implement a regular expression engine that can be orders of magnitudes faster than the available engines of awk, Perl, or Python.
so i downloaded the code and did the usual sudo make install
thing. however, that action had seemingly done little more than adding /usr/local/include/re2/re2.h
to my system. there seemed to be some ```.afile in addition, but then what is it with this
.a`` extension?
i would like to use re2 from Python (preferrably Python 3.1) and was excited to see files like make_unicode_groups.py
in the distro (maybe just used during the build process?). those however were not deployed on my machine.
how can i use re2 from Python?
update two friendly people have pointed out that i could try to build DLLs / *.so files from the sources and then use Python’s ctypes
library to access those. can anyone give useful pointers how to do just that? i’m pretty much clueless here, especially with the first part (building the *.so files).
update i have also posted this question (earlier) to the re2 developers’ group, without reply till now (it is a small group), and today to the (somewhat more populous) comp.lang.py group [—thread here—]. the hope is that people from various corners can contact each other. my guess is a skilled person can do this in a few hours during their 20% your-free-time-belongs-google-too timeslice; it would tie me for weeks. is there a tool to automatically dumb-down C++ to whatever flavor of C that Python needs to be able to connect? then maybe getting a viable result can be reduced to clever tool chaining.
(rant)why is this so difficult? to think that in 2010 we still cannot have our abundant pieces of software just talk to each other. this is such a roadblock that whenever you want to address some C code from Python you must always cruft these linking bits. this requires a lot of work, but only delivers an extension module that is specific to the version of the C code and the version of Python, so it ages fast.(/rant) would it be possible to run such things in separate processes (say if i had an re2 executable that can produce results for data that comes in on, say, subprocess/Popen/communicate()
)? (this should not be a pure command-line tool that necessitates the opening of a process each time it is needed, but a single processs that runs continuously; maybe there exist wrappers that sort of ‘demonize’ such C code).