Could you please give me an example of writing a custom gcc preprocessor?
My goal is to replace SID("foo") alike macros with appropriate CRC32 computed values. For any other macro I'd like to use the standard cpp preprocessor.
It looks like it's possible to achieve this goal using -no-integrated-cpp -B
options, however I can't find any simple example of their usage.
Warning: dangerous and ugly hack. Close your eyes now You can hook your own preprocessor by adding the '-no-integrated-cpp' and '-B' switches to the gcc command line. '-no-integrated-cpp' means that gcc does search in the '-B' path for its preprocessors before it uses its internal search path. The invocations of the preprocessor can be identified if the 'cc1', 'cc1plus' or 'cc1obj' programs (these are the C, C++ and Objective-c compilers) are invoked with the '-E' option. You can do your own preprocessing when you see this option. When there is no '-E' option pass all the parameters to the original programs. When there is such an option, you can do your own preprocessing, and pass the manipulated file to the original compiler.
It looks like this:
This example calls the original preprocessor, but prints an additional message and the parameters. You can replace the script by your own preprocessor.
The bad hack is over. You can open your eyes now.
One way is to use a program transformation system, to "rewrite" just the SID macro invocation to what you want before you do the compilation, leaving the rest of the preprocessor handling to the compiler itself.
Our DMS Software Reengineering Toolkit is a such a system, that can be applied to many languages including C and specifically the GCC 2/3/4 series of compilers.
To implement this idea using DMS, you would run DMS with its C front end over your source code before the compilation step. DMS can parse the code without expanding the preprocessor directives, build abstract syntax trees representing it, carry out transformations on the ASTs, and then spit out result as compilable C text.
The specific transformation rule you would use is:
where ComputeCRC32 is custom code that does what it says. (DMS includes a CRC32 implementation, so the custom code for this is pretty short.
DMS is kind a a big hammer for this task. You could use PERL to implement something pretty similar. The difference with PERL (or some other string match/replace hack) is the risk that a) it might find the pattern someplace where you don't want a replacement, e.g.
which you can probably fix by coding your pattern match carefully, b) fail to match a SID call found in suprising circumstances:
and c) fail to handle the various kinds of escape characters that show up in the literal string itself:
DMS's C front end handles all the escapes for you; the ComputeCRC32 function above would see the string containing the actual intended characters, not the raw text you see in the source code.
So its really a matter of whether you care about the dark-corner cases, or if you think you may have more special processing to do.
Given the way you've described the problem, I'd be sorely tempted to go the Perl route first and simply outlaw the funny cases. If you can't do this, then the big hammer makes sense.