I have heard that it was suboptimal for C to automatically collect garbage -- is there any truth to this?
Was there a specific reason garbage collection was not implemented for C?
I have heard that it was suboptimal for C to automatically collect garbage -- is there any truth to this?
Was there a specific reason garbage collection was not implemented for C?
Garbage collection has been implemented for C (e.g., the Boehm-Demers-Weiser collector). C wasn't specified to include GC when it was new for a number of reasons -- largely because for the hardware they were targeting and system they were building, it just didn't make much sense.
Edit (to answer a few allegations raised elsethread):
To make conservative GC well-defined, you basically only have to make one change to the language: say that anything that makes a pointer temporarily "invisible" leads to undefined behavior. For example, in current C you can write a pointer out to a file, overwrite the pointer in memory, later read it back in, and (assuming it was previously valid) still access the data it points at. A GC wouldn't necessarily "realize" that pointer existed, so it could see the memory as no longer being accessible, and therefore open to collection, so the later dereference wouldn't "work".
As far as garbage collection being non-deterministic: there are real-time collectors that are absolutely deterministic and can be used in hard real-time systems. There are also deterministic heap managers for manual management, but most manual managers are not deterministic.
As far as garbage collection being slow and/or thrashing the cache: technically, this is sort of true, but it's purely a technicality. While designs (e.g., generational scavenging) that (at least mostly) avoid these problems are well known, it's open to argument that they're not exactly garbage collection (even though they do pretty much the same things for the programmer).
As for the GC running at unknown or unexpected times: this isn't necessarily any more or less true than with manually managed memory. You can have a GC run in a separate thread that runs (at least somewhat) unpredictably. The same is true of coalescing free blocks with manual memory management. A particular attempt at allocating memory can trigger a collection cycle, leading to some allocations being much slower than others; the same is true with a manual manager that uses lazy coalescing of free blocks.
Oddly, GC is much less compatible with C++ than with C. Most C++ depends on destructors being invoked deterministically, but with garbage collection that's no longer the case. This breaks lots of code -- and the better written the code, the bigger of a problem it generally causes.
Likewise, C++ requires that
std::less<T>
provide meaningful (and, more importantly, consistent) results for pointers, even when they point to entirely independent objects. It would require some extra work to meet this requirement with a copying collector/scavenger (but I'm pretty sure it is possible). It's more difficult still to deal with (for example) somebody hashing an address and expecting consistent results. This is generally a poor idea, but it's still possible, and should produce consistent results.C was invented back in the early 1970s for writing operating systems and other low level stuff. Garbage collectors were around (eg. early versions of Smalltalk) but I doubt they were up to the task of running in such a lightweight environment, and there would be all the complications of working with very low level buffers and pointers.
C is a very low level language. It's the kind of language you might choose to write a higher-level language with things such as garbage collection in. It's small and simple, and does exactly what you ask.
It took C++ to build on C and add more sophisticated/automatic memory management (things like calling destructors when objects went out of scope). You may then wonder why C++ doesn't have garbage collection, in which case see what Stroustrup has to say: briefly, people want to do things in a more direct manner, and people who really want it can use a library (which can also be used in C).
1972.
C was designed in 1972.
Worse, c was designed on obsolete hardware. In 1972.
Don't get me wrong. They had garbage collection in 1972, but all the issues that people complain about to this day were real, very valid concerns at the time.
C is a very old language and lacks many of the bells and whistle's of modern languages. To add garbage collection now would require a major respec of the language. Generally anyone willing to make that many changes to C would just rather create their own language.
Adding automatic garbage collection onto a language generally will either reduce performance, or will cause the garbage collection to happen at un forseen times. To add garbage collection to C would cause it to lose one of it's comparative advantages, in that it can be used for system level programming that requires real time or near real time response times.
Don't listen to the "C is old and that's why it doesn't have GC" folks. There are fundamental problems with GC that cannot be overcome which make it incompatible with C.
The biggest problem is that accurate garbage collection requires the ability to scan memory and identify any pointers encountered. Some higher level languages limit integers not to use all the bits available, so that high bits can be used to distinguish object references from integers. Such languages may then store strings (which could contain arbitrary octet sequences) in a special string zone where they can't be confused with pointers, and all is well. A C implementation, however, cannot do this because bytes, larger integers, pointers, and everything else can be stored together in structures, unions, or as part of chunks returned by
malloc
.What if you throw away the accuracy requirement and decide you're okay with a few objects never getting freed because some non-pointer data in the program has the same bit pattern as these objects' addresses? Now suppose your program receives data from the outside world (network/files/etc.). I claim I can make your program leak an arbitrary amount of memory, and eventually run out of memory, as long as I can guess enough pointers and emulate them in the strings I feed your program. This gets a lot easier if you apply De Bruijn Sequences.
Aside from that, garbage collection is just plain slow. You can find hundreds of academics who like to claim otherwise, but that won't change the reality. The performance issues of GC can be broken down into 3 main categories:
The people who will claim GC is fast these days are simply comparing it to the wrong thing: poorly written C and C++ programs which allocate and free thousands or millions of objects per second. Yes, these will also be slow, but at least predictably slow in a way you can measure and fix if necessary. A well-written C program will spend so little time in
malloc
/free
that the overhead is not even measurable.