I have heard that it was suboptimal for C to automatically collect garbage -- is there any truth to this?
Was there a specific reason garbage collection was not implemented for C?
I have heard that it was suboptimal for C to automatically collect garbage -- is there any truth to this?
Was there a specific reason garbage collection was not implemented for C?
Don't listen to the "C is old and that's why it doesn't have GC" folks. There are fundamental problems with GC that cannot be overcome which make it incompatible with C.
The biggest problem is that accurate garbage collection requires the ability to scan memory and identify any pointers encountered. Some higher level languages limit integers not to use all the bits available, so that high bits can be used to distinguish object references from integers. Such languages may then store strings (which could contain arbitrary octet sequences) in a special string zone where they can't be confused with pointers, and all is well. A C implementation, however, cannot do this because bytes, larger integers, pointers, and everything else can be stored together in structures, unions, or as part of chunks returned by malloc
.
What if you throw away the accuracy requirement and decide you're okay with a few objects never getting freed because some non-pointer data in the program has the same bit pattern as these objects' addresses? Now suppose your program receives data from the outside world (network/files/etc.). I claim I can make your program leak an arbitrary amount of memory, and eventually run out of memory, as long as I can guess enough pointers and emulate them in the strings I feed your program. This gets a lot easier if you apply De Bruijn Sequences.
Aside from that, garbage collection is just plain slow. You can find hundreds of academics who like to claim otherwise, but that won't change the reality. The performance issues of GC can be broken down into 3 main categories:
The people who will claim GC is fast these days are simply comparing it to the wrong thing: poorly written C and C++ programs which allocate and free thousands or millions of objects per second. Yes, these will also be slow, but at least predictably slow in a way you can measure and fix if necessary. A well-written C program will spend so little time in malloc
/free
that the overhead is not even measurable.
Garbage collection has been implemented for C (e.g., the Boehm-Demers-Weiser collector). C wasn't specified to include GC when it was new for a number of reasons -- largely because for the hardware they were targeting and system they were building, it just didn't make much sense.
Edit (to answer a few allegations raised elsethread):
To make conservative GC well-defined, you basically only have to make one change to the language: say that anything that makes a pointer temporarily "invisible" leads to undefined behavior. For example, in current C you can write a pointer out to a file, overwrite the pointer in memory, later read it back in, and (assuming it was previously valid) still access the data it points at. A GC wouldn't necessarily "realize" that pointer existed, so it could see the memory as no longer being accessible, and therefore open to collection, so the later dereference wouldn't "work".
As far as garbage collection being non-deterministic: there are real-time collectors that are absolutely deterministic and can be used in hard real-time systems. There are also deterministic heap managers for manual management, but most manual managers are not deterministic.
As far as garbage collection being slow and/or thrashing the cache: technically, this is sort of true, but it's purely a technicality. While designs (e.g., generational scavenging) that (at least mostly) avoid these problems are well known, it's open to argument that they're not exactly garbage collection (even though they do pretty much the same things for the programmer).
As for the GC running at unknown or unexpected times: this isn't necessarily any more or less true than with manually managed memory. You can have a GC run in a separate thread that runs (at least somewhat) unpredictably. The same is true of coalescing free blocks with manual memory management. A particular attempt at allocating memory can trigger a collection cycle, leading to some allocations being much slower than others; the same is true with a manual manager that uses lazy coalescing of free blocks.
Oddly, GC is much less compatible with C++ than with C. Most C++ depends on destructors being invoked deterministically, but with garbage collection that's no longer the case. This breaks lots of code -- and the better written the code, the bigger of a problem it generally causes.
Likewise, C++ requires that std::less<T>
provide meaningful (and, more importantly, consistent) results for pointers, even when they point to entirely independent objects. It would require some extra work to meet this requirement with a copying collector/scavenger (but I'm pretty sure it is possible). It's more difficult still to deal with (for example) somebody hashing an address and expecting consistent results. This is generally a poor idea, but it's still possible, and should produce consistent results.
C was invented back in the early 1970s for writing operating systems and other low level stuff. Garbage collectors were around (eg. early versions of Smalltalk) but I doubt they were up to the task of running in such a lightweight environment, and there would be all the complications of working with very low level buffers and pointers.
I'm sorry, but saying that C doesn't have GC because it's old or because it was designed for different hardware is a bit ridiculous.
Asking why does C has no GC but other languages do is a bit like asking "why does a hammer have a blunt end while an axe has a sharp end"? Well, languages are tools, and different tools were meant to build different programs with different requirements. Some programs, like device drivers or operating system kernels, have requirements that make it nearly impossible to implement garbage collection while still satisfying them, so a language like C was needed, so C was created.
Just like every other tool, it is the way it is because the problems it was meant to solve forced it to be that way. People who say it was because it was created in the 70s or because of old hardware make it seem like it was completely circumstantial. If this was true, C would have gone away with the 70s, but it's still a popular language.
Besides, gc is very hard to implement cleanly without some sort of runtime system, but C is meant to run on bare hardware.
C is a very old language and lacks many of the bells and whistle's of modern languages. To add garbage collection now would require a major respec of the language. Generally anyone willing to make that many changes to C would just rather create their own language.
Adding automatic garbage collection onto a language generally will either reduce performance, or will cause the garbage collection to happen at un forseen times. To add garbage collection to C would cause it to lose one of it's comparative advantages, in that it can be used for system level programming that requires real time or near real time response times.
As a language, C was intentionally designed to be really small. It has very few intrinsic operations or features, and most of those reflect the basic instructions found in CPUs. It's often been called a "portable assembly language".
That makes it great for writing portable programs to switch telephone calls around, that run on dinky little computers with very little memory, which is part of what Bell Labs was after in the early 70s when they were first working on C and Unix.
That also makes C great for writing OS kernels, device drivers, and so on. Such executables can't count on having rich, sophisticated runtime environments like garbage collection - or even just dynamic memory allocation like malloc(). That's because they actually form the foundation for those environments, so you run into the "pulling yourself up by your bootstraps" problem.
People have written garbage-collected memory allocation libraries for C. But libraries aren't part of the C language itself, and it's very unlikely something that complex would ever be accepted as part of the standard C libraries. C programmers are a very conservative lot around changes to the standard.
C is a very low level language. It's the kind of language you might choose to write a higher-level language with things such as garbage collection in. It's small and simple, and does exactly what you ask.
It took C++ to build on C and add more sophisticated/automatic memory management (things like calling destructors when objects went out of scope). You may then wonder why C++ doesn't have garbage collection, in which case see what Stroustrup has to say: briefly, people want to do things in a more direct manner, and people who really want it can use a library (which can also be used in C).
I like what JWZ has to say about C. :-)
C is "the PDP-11 assembler that thinks it's a language".
So, if you look at how you would create a portable assembler, then C looks totally right without garbage collection, for the simple reason CPUs do not have a "garbage collect" instruction.*
(While most other CPU instructions are captured just fine with C to this day, some are not though, like Cs' lack of expressiveness for SIMD operations etc.)
* Yes, I know some of you will find an example to prove me wrong. But in the general case...
Objective_C is ANSI C with something like 12 keywords added and a type called id. It has garbage collection. It uses reference counting.
Garbage collection could be implemented for C, though with normal CPU hardware it would be somewhat difficult and inefficient. To really make garbage collection work well, one would need an architecture which used base+offset addressing for everything; the base part of every pointer would be required to always point at the start of an allocated region. Note that if the base were a "selector" rather than a linear address, a 32-bit selector plus a 32-bit offset could access 4 billion objects, each of which could be up to 4 gigs. I would think such a design could be more efficient than 64-bit linear addressing for many applications, since object references and offsets would use half as much cache space as with 64-bit pointers. Using 64 bits just seems like a waste.
BTW, an architecture which used base+offset address for everything would have another advantage over a linear-addressing system: garbage collection could be run concurrently with other operations. The only time garbage collection would have to disturb a running thread would be if the thread tried to access a data object while that particular object was being relocated. Under that scenario, either the relocation attempt would have to be aborted (assuming it was a non-overlapping copy), or the thread would have to wait for that particular object to be copied. A much better situation for real-time code than exists when all threads have to be suspended for GC.
The strength (and weakness) of C is that it is small, simple, light weight, and well suited to embedded and systems applications. GC would add overhead, and it would no longer be small and simple. GC for C is simply not appropriate. It's no more complicated than that.
1972.
C was designed in 1972.
Worse, c was designed on obsolete hardware. In 1972.
Don't get me wrong. They had garbage collection in 1972, but all the issues that people complain about to this day were real, very valid concerns at the time.
Garbage Collection is not useful if you want to make hardcore optimisations, because if you decide the moment when it's best to free allocated space, then you can make a much better job in your project than the garbage collector does. In the 1970s I don't know the exact cause why the developers decided to not include a garbage collector to C, probably they wanted to have absolute control of deallocation, however I'm sure later they didn't even consider to add a garbage collector, because even if there is an integration of a garbage collector to the language which is very good, there were built a lot of C projects and the new versions of the language must have backwards compatibility.
EDIT: "There are also compilers, libraries and operating system level mechanisms for performing array bounds checking, buffer overflow detection, serialization and automatic garbage collection, that are not a standard part of C." This may be interesting for you, you can read more here.