I want the preprocessor to read in the includes of local headers, but ignore the includes of system headers. To put it another way, how do I get the preprocessor to skip over preprocessing directives of the form:
#include <h-char-sequence> new-line
but still process directives of the form:
#include "q-char-sequence" new-line
As a code example, observe the following file:
#include <iostream> //system
#include "class_a.hpp" //local
#include <string> //system
#include "class_b.hpp" //local
int main() {}
how can I get the output of the preprocessor to be:
#include <iostream>
class A{};
#include <string>
class B{};
int main() {}
Local include files may include other local include files, and the preprocessor would recursively bring them all in; much like it normally does. It would still print all of the system file headers, but it would not bring in their contents.
on gcc, my call looks like this so far: g++ -E -P main.cpp
, where -E
stops after preprocessing, and -P
excludes the generation of line markers.
I can't seem to find a flag that excludes the processing of system headers.
How much effort are you willing to go to? There's an obnoxiously obscure way to do it but it requires you to set up a dummy directory to hold surrogates for the system headers. OTOH, it doesn't require any changes in any of your source code. The same technique works equally well for C code.
Setup
Files:
./class_a.hpp
./class_b.hpp
./example.cpp
./system-headers/iostream
./system-headers/string
The 'system headers' such as ./system-headers/iostream
contain a single line (there is no #
on that line!):
include <iostream>
The class headers each contain a single line like:
class A{};
The contents of example.cpp
are what you show in the question:
#include <iostream> //system
#include "class_a.hpp" //local
#include <string> //system
#include "class_b.hpp" //local
int main() {}
Running the C preprocessor
Running the C preprocessor like this produces the output shown:
$ cpp -Dinclude=#include -I. -Isystem-headers example.cpp
# 1 "example.cpp"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "example.cpp"
# 1 "system-headers/iostream" 1
#include <iostream>
# 2 "example.cpp" 2
# 1 "class_a.hpp" 1
class A{};
# 3 "example.cpp" 2
# 1 "system-headers/string" 1
#include <string>
# 4 "example.cpp" 2
# 1 "class_b.hpp" 1
class B{};
# 5 "example.cpp" 2
int main() {}
$
If you eliminate the # n
lines, that output is:
$ cpp -Dinclude=#include -I. -Isystem-headers example.cpp | grep -v '^# [0-9]'
#include <iostream>
class A{};
#include <string>
class B{};
int main() {}
$
which, give or take the space at the beginning of the lines containing #include
, is what you wanted.
Analysis
The -Dinclude=#include
argument is equivalent to #define include #include
. When the preprocessor generates output from a macro, even if it looks like a directive (such as #include
), it is not a preprocessor directive. Quoting the C++11 standard ISO/IEC 14882:2011 (not that this has changed between versions AFAIK — and is, verbatim, what it says in the C11 standard, ISO/IEC 9899:2011 too, in §6.10.3):
§16.3 Macro replacement
¶8 If a #
preprocessing token, followed by an identifier, occurs lexically at the point at which a preprocessing directive could begin, the identifier is not subject to macro replacement.
§16.3.4 Rescanning and further replacement
¶2 If the name of the macro being replaced is found during this scan of the replacement list (not including the rest of the source file’s preprocessing tokens), it is not replaced. …
¶3 The resulting completely macro-replaced preprocessing token sequence is not processed as a preprocessing directive even if it resembles one, …
When the preprocessor encounters #include <iostream>
, it looks in the current directory and finds no file, then looks in ./system-headers
and finds the file iostream
so it processes that into the output. It contains a single line, include <iostream>
. Since include
is a macro, it is expanded (to #include
) but further expansion is prevented, and the #
is not processed as a directive because of §16.3.4 ¶3. Thus, the output contains #include <iostream>
.
When the preprocessor encounters #include "class_a.hpp"
, it looks in the current directory and finds the file and includes its contents in the output.
Rinse and repeat for the other headers. If class_a.hpp
contained #include <iostream>
, then that ends up expanding to #include <iostream>
again (with the leading space). If your system-headers
directory is missing any header, then the preprocessor will search in the normal locations and find and include that. If you use the compiler rather than cpp
directly, you can prohibit it from looking in the system directories with -nostdinc
— so the preprocessor will generate an error if system-headers
is missing a (surrogate for a) system header.
$ g++ -E -nostdinc -Dinclude=#include -I. -Isystem-headers example.cpp | grep -v '^# [0-9]'
#include <iostream>
class A{};
#include <string>
class B{};
int main() {}
$
Note that it is very easy to generate the surrogate system headers:
for header in algorithm chrono iostream string …
do echo "include <$header>" > system-headers/$header
done
JFTR, testing was done on Mac OS X 10.11.5 with GCC 6.1.0. If you're using GCC (the GNU Compiler Collection, with leading example compilers gcc
and g++
), your mileage shouldn't vary very much with any plausible alternative version.
If you're uncomfortable using the macro name include
, you can change it to anything else that suits you — syzygy
, apoplexy
, nadir
, reinclude
, … — and change the surrogate headers to use that name, and define that name on the preprocessor (compiler) command line. One advantage of include
is that it's improbable that you have anything using that as a macro name.
Automatically generating surrogate headers
osgx asks:
How can we automate the generation of mock system headers?
There are a variety of options. One is to analyze your code (with grep
for example) to find the names that are, or might be, referenced and generate the appropriate surrogate headers. It doesn't matter if you generate a few unused headers — they won't affect the process. Note that if you use #include <sys/wait.h>
, the surrogate must be ./system-headers/sys/wait.h
; that slightly complicates the shell code shown, but not by very much. Another way would look at the headers in the system header directories (/usr/include
, /usr/local/include
, etc) and generate surrogates for the headers you find there.
For example, mksurrogates.sh
might be:
#!/bin/sh
sysdir="./system-headers"
for header in "$@"
do
mkdir -p "$sysdir/$(dirname $header)"
echo "include <$header>" > "$sysdir/$header"
done
And we can write listsyshdrs.sh
to find the system headers referenced in source code under a named directory:
#!/bin/sh
grep -h -e '^[[:space:]]*#[[:space:]]*include[[:space:]]*<[^>]*>' -r "${@:-.}" |
sed 's/^[[:space:]]*#[[:space:]]*include[[:space:]]*<\([^>]*\)>.*/\1/' |
sort -u
With a bit of formatting added, that generated a list of headers like this when I scanned the source tree with my answers to SO questions:
algorithm arpa/inet.h assert.h cassert
chrono cmath cstddef cstdint
cstdlib cstring ctime ctype.h
dirent.h errno.h fcntl.h float.h
getopt.h inttypes.h iomanip iostream
limits.h locale.h map math.h
memory.h netdb.h netinet/in.h pthread.h
semaphore.h signal.h sstream stdarg.h
stdbool.h stddef.h stdint.h stdio.h
stdlib.h string string.h sys/ipc.h
sys/mman.h sys/param.h sys/ptrace.h sys/select.h
sys/sem.h sys/shm.h sys/socket.h sys/stat.h
sys/time.h sys/timeb.h sys/times.h sys/types.h
sys/wait.h termios.h time.h unistd.h
utility vector wchar.h
So, to generate the surrogates for the source tree under the current directory:
$ sh mksurrogatehdr.sh $(sh listsyshdrs.sh)
$ ls -lR system-headers
total 344
-rw-r--r-- 1 jleffler staff 20 Jul 2 17:27 algorithm
drwxr-xr-x 3 jleffler staff 102 Jul 2 17:27 arpa
-rw-r--r-- 1 jleffler staff 19 Jul 2 17:27 assert.h
-rw-r--r-- 1 jleffler staff 18 Jul 2 17:27 cassert
-rw-r--r-- 1 jleffler staff 17 Jul 2 17:27 chrono
-rw-r--r-- 1 jleffler staff 16 Jul 2 17:27 cmath
-rw-r--r-- 1 jleffler staff 18 Jul 2 17:27 cstddef
-rw-r--r-- 1 jleffler staff 18 Jul 2 17:27 cstdint
-rw-r--r-- 1 jleffler staff 18 Jul 2 17:27 cstdlib
-rw-r--r-- 1 jleffler staff 18 Jul 2 17:27 cstring
-rw-r--r-- 1 jleffler staff 16 Jul 2 17:27 ctime
-rw-r--r-- 1 jleffler staff 18 Jul 2 17:27 ctype.h
-rw-r--r-- 1 jleffler staff 19 Jul 2 17:27 dirent.h
-rw-r--r-- 1 jleffler staff 18 Jul 2 17:27 errno.h
-rw-r--r-- 1 jleffler staff 18 Jul 2 17:27 fcntl.h
-rw-r--r-- 1 jleffler staff 18 Jul 2 17:27 float.h
-rw-r--r-- 1 jleffler staff 19 Jul 2 17:27 getopt.h
-rw-r--r-- 1 jleffler staff 21 Jul 2 17:27 inttypes.h
-rw-r--r-- 1 jleffler staff 18 Jul 2 17:27 iomanip
-rw-r--r-- 1 jleffler staff 19 Jul 2 17:27 iostream
-rw-r--r-- 1 jleffler staff 19 Jul 2 17:27 limits.h
-rw-r--r-- 1 jleffler staff 19 Jul 2 17:27 locale.h
-rw-r--r-- 1 jleffler staff 14 Jul 2 17:27 map
-rw-r--r-- 1 jleffler staff 17 Jul 2 17:27 math.h
-rw-r--r-- 1 jleffler staff 19 Jul 2 17:27 memory.h
-rw-r--r-- 1 jleffler staff 18 Jul 2 17:27 netdb.h
drwxr-xr-x 3 jleffler staff 102 Jul 2 17:27 netinet
-rw-r--r-- 1 jleffler staff 20 Jul 2 17:27 pthread.h
-rw-r--r-- 1 jleffler staff 22 Jul 2 17:27 semaphore.h
-rw-r--r-- 1 jleffler staff 19 Jul 2 17:27 signal.h
-rw-r--r-- 1 jleffler staff 18 Jul 2 17:27 sstream
-rw-r--r-- 1 jleffler staff 19 Jul 2 17:27 stdarg.h
-rw-r--r-- 1 jleffler staff 20 Jul 2 17:27 stdbool.h
-rw-r--r-- 1 jleffler staff 19 Jul 2 17:27 stddef.h
-rw-r--r-- 1 jleffler staff 19 Jul 2 17:27 stdint.h
-rw-r--r-- 1 jleffler staff 18 Jul 2 17:27 stdio.h
-rw-r--r-- 1 jleffler staff 19 Jul 2 17:27 stdlib.h
-rw-r--r-- 1 jleffler staff 17 Jul 2 17:27 string
-rw-r--r-- 1 jleffler staff 19 Jul 2 17:27 string.h
drwxr-xr-x 16 jleffler staff 544 Jul 2 17:27 sys
-rw-r--r-- 1 jleffler staff 20 Jul 2 17:27 termios.h
-rw-r--r-- 1 jleffler staff 17 Jul 2 17:27 time.h
-rw-r--r-- 1 jleffler staff 19 Jul 2 17:27 unistd.h
-rw-r--r-- 1 jleffler staff 18 Jul 2 17:27 utility
-rw-r--r-- 1 jleffler staff 17 Jul 2 17:27 vector
-rw-r--r-- 1 jleffler staff 18 Jul 2 17:27 wchar.h
system-headers/arpa:
total 8
-rw-r--r-- 1 jleffler staff 22 Jul 2 17:27 inet.h
system-headers/netinet:
total 8
-rw-r--r-- 1 jleffler staff 23 Jul 2 17:27 in.h
system-headers/sys:
total 112
-rw-r--r-- 1 jleffler staff 20 Jul 2 17:27 ipc.h
-rw-r--r-- 1 jleffler staff 21 Jul 2 17:27 mman.h
-rw-r--r-- 1 jleffler staff 22 Jul 2 17:27 param.h
-rw-r--r-- 1 jleffler staff 23 Jul 2 17:27 ptrace.h
-rw-r--r-- 1 jleffler staff 23 Jul 2 17:27 select.h
-rw-r--r-- 1 jleffler staff 20 Jul 2 17:27 sem.h
-rw-r--r-- 1 jleffler staff 20 Jul 2 17:27 shm.h
-rw-r--r-- 1 jleffler staff 23 Jul 2 17:27 socket.h
-rw-r--r-- 1 jleffler staff 21 Jul 2 17:27 stat.h
-rw-r--r-- 1 jleffler staff 21 Jul 2 17:27 time.h
-rw-r--r-- 1 jleffler staff 22 Jul 2 17:27 timeb.h
-rw-r--r-- 1 jleffler staff 22 Jul 2 17:27 times.h
-rw-r--r-- 1 jleffler staff 22 Jul 2 17:27 types.h
-rw-r--r-- 1 jleffler staff 21 Jul 2 17:27 wait.h
$
This assumes that header file names contain no spaces, which is not unreasonable — it would be a brave programmer who created header file names with spaces or other tricky characters.
A full production-ready version of mksurrogates.sh
would accept an argument specifying the surrogate header directory.
With clang you can do e.g.:
clang -Imyinclude -P -E -nostdinc -nobuiltininc main.cpp
There does not seem to be a way to preserve the system #include
lines it cannot find though.
This doesn't work for gcc, as its preprocessor will stop when using -nostdinc
and it can't find an #included
header file.
You could put a #define SYSTEM_HEADERS 0
in a configuration header and do it like this
#include "config.h" // the configuration header
#include "class_a.hpp"
#include "class_b.hpp"
#if SYSTEM_HEADERS // which is #if 0
#include <iostream>
#include <string>
#endif
and when you want system headers you could make it #define SYSTEM_HEADERS 1
which will include system headers.