Pycparser not working on preprocessed code

2020-07-30 00:59发布

问题:

I need to use pycparser on preprocessed C code (the results produced by 'gcc -E'). However I am currently running into issue that I can't understand or solve.

I am using the provided samples year2.c and func_defs.py, which i modified to use a variety of preprocessors and fake libraries to no avail. Maybe some of you can look into this and see if you can reproduce/solve the issue. I will append all necessary code.

The errors were generated using year2.c (regular sample file) and year2.i ('gcc -E' output). There was no useable result for the latter while the former worked with both preprocessor/fakelib variants.

I have created a bitbucket repo with all relevant errors, the script used (albeit only its last variation) and the year2.c and year2.i files.

Error & Sample Repo

Thanks for your time.

回答1:

The error you're getting is:

pycparser.plyparser.ParseError: /usr/lib/gcc/x86_64-linux-gnu/4.8/include/stdarg.h:40:27: before: __gnuc_va_list

The line indicated as causing the error (stdarg.h:40):

typedef __builtin_va_list __gnuc_va_list;

In gcc, __builtin_va_list is, as its name indicates, built in to the compiler. Consequently, no declaration of that type is necessary (or allowed).

It's pretty common for C compilers to use a symbol-table-based technique to parse typenames, since there are a number of ambiguities in the grammar if you cannot distinguish a typename from another identifier. Such a parser will assume that an undeclared identifier is not a typename, and if __builtin_va_list is not a typename, that typedef is a syntax error.

So I suppose that the pyparser grammar you're using doesn't know about gcc builtin types (and why should it?).

Your fakelib seems to be including the same header file. That's not surprising since it is hard to fake stdarg.h; although technically a library header, it is part of the small set of headers which must be provided by the compiler even in a freestanding (no standard library) implementation: <float.h>, <iso646.h>, <limits.h>, <stdalign.h>, <stdarg.h>, <stdbool.h>, <stddef.h>, <stdint.h>, and <stdnoreturn.h> (C11 standard, clause 4, paragraph 6). These must be implemented by the compiler because there is no way an external library can know enough about the nature of the compiled code to properly define them.

Depending on what you require from the pyparsed output, you may be able to workaround this for pyparser by including a definition of __builtin_va_list, such as:

typedef struct __builtin_va_list { } __builtin_va_list;

__builtin_va_list is not the only builtin gcc datatype, although you may not run into the other ones. So you might have to iterate this solution a few times until you achieve whatever it is you are trying to achieve.



回答2:

As @rici has explained the cause of the error. I'd focus more on how to solve it. I've taken my answer from pycparser author's blog - http://eli.thegreenplace.net/2015/on-parsing-c-type-declarations-and-fake-headers

The idea is that pycparser needs to know what anyheader.h contains so it can properly parse the code. As actually parsing anyheader.h and all the other headers it transitively includes, could be very time consuming and perhaps not required for your task, fakeheaders can be used. A fake anyheader.h will only contain the parts of the original that are necessary for parsing - the #defines and the typedefs.

gcc -nostdinc -E -I/home/rg/pycparser-master/utils/fake_libc_include test.c > testPP.c

The above command preprocess test.c which contains <stdio.h> using fake headers provided with pycparser package. -nostdinc flag is used to block some pre-set system header directories that gcc automatically includes. Now, parsing the preprocessed file, using e.g. below code

import pycparser
pycparser.parse_file('testPP.c')

should work in the most of the cases. If it doesn't make sure you provide all the dependencies for preprocessing. In case, for some headers fakes are not provided, you can fake error causing typedef using #defining e.g. to resolve an error caused by __builtin_va_list, you can try faking it as follows:

gcc -nostdinc -E -D'__builtin_va_list(x)=' -I/home/rg/pycparser-master/utils/fake_libc_include test.c > testPP.c