I come from a scripting background and the preprocessor in C has always seemed ugly to me. None the less I have embraced it as I learn to write small C programs. I am only really using the preprocessor for including the standard libraries and header files I have written for my own functions.
My question is why don't C programmers just skip all the includes and simply concatenate their C source files and then compile it? If you put all of your includes in one place you would only have to define what you need once, rather than in all your source files.
Here's an example of what I'm describing. Here I have three files:
// includes.c
#include <stdio.h>
// main.c
int main() {
foo();
printf("world\n");
return 0;
}
// foo.c
void foo() {
printf("Hello ");
}
By doing something like cat *.c > to_compile.c && gcc -o myprogram to_compile.c
in my Makefile I can reduce the amount of code I write.
This means that I don't have to write a header file for each function I create (because they're already in the main source file) and it also means I don't have to include the standard libraries in each file I create. This seems like a great idea to me!
However I realise that C is a very mature programming language and I'm imagining that someone else a lot smarter than me has already had this idea and decided not to use it. Why not?
Header files should define interfaces - that's a desirable convention to follow. They aren't meant to declare everything that's in a corresponding
.c
file, or a group of.c
files. Instead, they declare all functionality in the.c
file(s) that is available to their users. A well designed.h
file comprises a basic document of the interface exposed by the code in the.c
file even if there isn't a single comment in it. One way to approach the design of a C module is to write the header file first, and then implement it in one or more.c
files.Corollary: functions and data structures internal to the implementation of a
.c
file don't normally belong in the header file. You might need forward declarations, but those should be local and all variables and functions thus declared and defined should bestatic
: if they are not a part of the interface, the linker shouldn't see them.Your approach of concatenating .c files is completely broken:
Even though the command
cat *.c > to_compile.c
will put all functions into a single file, order matters: You must have each function declared before its first use.That is, you have dependencies between your .c files which force a certain order. If your concatenation command fails to honor this order, you won't be able to compile the result.
Also, if you have two functions that recursively use each other, there is absolutely no way around writing a forward declaration for at least one of the two. You may as well put those forward declarations into a header file where people expect to find them.
When you concatenate everything into a single file, you force a full rebuild whenever a single line in your project changes.
With the classic .c/.h split compilation approach, a change in the implementation of a function necessitates recompilation of exactly one file, while a change in a header necessitates recompilation of the files that actually include this header. This can easily speed up the rebuild after a small change by a factor of 100 or more (depending on the count of .c files).
You loose all the ability for parallel compilation when you concatenate everything into a single file.
Have a big fat 12 core processor with hyper-threading enabled? Pity, your concatenated source file is compiled by a single thread. You just lost a speedup of a factor greater than 20... Ok, this is an extreme example, but I have build software with
make -j16
already, and I tell you, it can make a huge difference.Compilation times are generally not linear.
Usually compilers contain at least some algorithms that have a quadratic runtime behavior. Consequently, there is usually some threshold from which on aggregated compilation is actually slower than compilation of the independent parts.
Obviously, the precise location of this threshold depends on the compiler and the optimization flags you pass to it, but I have seen a compiler take over half an hour on a single huge source file. You don't want to have such an obstacle in your change-compile-test loop.
Make no mistake: Even though it comes with all these problems, there are people who use .c file concatenation in practice, and some C++ programmers get pretty much to the same point by moving everything into templates (so that the implementation is found in the .hpp file and there is no associated .cpp file), letting the preprocessor do the concatenation. I fail to see how they can ignore these problems, but they do.
Also note, that many of these problems only become apparent with larger project sizes. If your project is less than 5000 lines of code, it's still relatively irrelevant how you compile it. But when you have more than 50000 lines of code, you definitely want a build system that supports incremental and parallel builds. Otherwise, you are wasting your working time.
You could do that, but we like to separate C programs into separate translation units, chiefly because:
It speeds up builds. You only need to rebuild the files that have changed, and those can be linked with other compiled files to form the final program.
The C standard library consists of pre-compiled components. Would you really want to have to recompile all that?
It's easier to collaborate with other programmers if the code base is split up into different files.
That's the purpose of
.h
files, so you can define what you need once and include it everywhere. Some projects even have aneverything.h
header that includes every individual.h
file. So, your pro can be achieved with separate.c
files as well.You're not supposed to write one header file for every function anyway. You're supposed to have one header file for a set of related functions. So your con is not valid either.