"The average man does not want to be free. He simply wants to be safe." - H. L. Menken
I am attempting to write very secure C. Below I list some of the techniques I use and ask are they as secure as I think they are. Please don't not hesitate to tear my code/preconceptions to shreds. Any answer that finds even the most trivial vulnerability or teaches me a new idea will be highly valued.
Reading from a stream:
According to the GNU C Programming Tutorial getline:
The getline function will
automatically enlarge the block of
memory as needed, via the realloc
function, so there is never a shortage
of space -- one reason why getline is
so safe. [..] Notice that getline can
safely handle your line of input, no
matter how long it is.
I assume that getline should, under all inputs, prevent a buffer overflow from occurring when reading from a stream.
- Is my assumption correct? Are there inputs and/or allocation schemes under which this could lead to an exploit? For instance what if the first character from the stream is some bizarre control character, maybe 0x08 BACKSPACE (ctl-H).
- Has any work been done to mathematically prove getline as secure?
Malloc Returns Null on Failure:
If malloc encounters an error malloc returns a NULL pointer. This presents a security risk since one can still apply pointer arithmetic to a NULL (0x0) pointer, thus wikipedia recommends
/* Allocate space for an array with ten elements of type int. */
int *ptr = (int*)malloc(10 * sizeof (int));
if (ptr == NULL) {
/* Memory could not be allocated, the program should handle
the error here as appropriate. */
}
Secure sscanf:
When using sscanf I've gotten in the habit of allocating the size to-be-extracted strings to the size of the input string hopefully avoiding the possibility of an overrun. For example:
const char *inputStr = "a01234b4567c";
const char *formatStr = "a%[0-9]b%[0-9]c":
char *str1[strlen(inputStr)];
char *str2[strlen(inputStr)];
sscanf(inputStr, formatStr, str1, str2);
Because str1 and str2 are the size of the inputStr and no more characters than strlen(inputStr) can be read from inputStr, it seems impossible, given all possible values for the inputStr to cause a buffer overflow?
- Am I correct? Are there strange corner cases I haven't thought of?
- Are there better ways to write this? Libraries that have already solved it?
General Questions:
While I've posted a large number of questions I don't expect anyone to answer all of them. The questions are more of guideline to the sorts of answers I am looking for. I really want to learn the secure C mindset.
- What other secure C idioms are out there?
- What corner cases do I need to always check?
- How can I write unit tests to enforce these rules?
- How can I enforce constraints in a testability or provably correct way?
- Any recommended static/dynamic analysis technics or tools for C?
- What secure C practices do you follow and how do you justify them to yourself and others?
Resources:
Many of the resources were borrowed from the answers.
- Secure Programming for Linux and Unix HOWTO by David Wheeler
- Secure C programming - SUN Microsystems
- Insecure Programming by Example
- Add More NOPS - blog covering these issues
- CERT Secure Coding Initiative
- flawfinder - static analysis tool
- Using Thm Provers to prove safety by Yannick Moy
- libsafe
- Reading from a stream
The fact that getline()
"will automatically enlarge the block of memory as needed" means that this could be used as a denial-of-service attack, as it would be trivial to generate an input that was so long it would exhaust the available memory for the process (or worse, the system!). Once an out-of-memory condition occurs, other vulnerabilities may also come into play. The behaviour of code in low/no memory is rarely nice, and very hard to predict. IMHO it is safer to set reasonable upper bounds on everything, especially in security-sensitive applications.
Furthermore (as you anticipate by mentioning special characters), getline()
only gives you a buffer; it does not make any guarantees about the contents of the buffer (as the safety is entirely application-dependent). So sanitising the input is still an essential part of processing and validating user data.
- sscanf
I would tend to prefer to use a regular expression library, and have very narrowly defined regexps for user data, rather than use sscanf
. This way you can perform a good deal of validation at the time of input.
General comments
- Fuzzing tools are available which generate random input (both valid and invalid) that can be used to test your input handling
- Buffer management is critical: buffer overflows, underflows, out-of-memory
- Race conditions can be exploited in otherwise secure code
- Binary files could be manipulated to inject invalid values or oversized values into headers, so file format code must be rock-solid and not assume binary data is valid
- Temporary files can often be a source of security issues, and must be carefully managed
- Code injection can be used to replace system or runtime library calls with malicious versions
- Plugins provide a huge vector for attack
- As a general principle, I would suggest having clearly defined interfaces where user data (or any data from outside the application) is assumed invalid and hostile until it is processed, sanitised and validated, and the only way for user data to enter the application
I think your sscanf example is wrong. It can still overflow when used that way.
Try this, which specifies the maximum number of bytes to read:
void main(int argc, char **argv)
{
char buf[256];
sscanf(argv[0], "%255s", &buf);
}
Take a look at this IBM dev article about protecting against buffer overflows.
In terms of testing, I would write a program that generates random strings of random length and feed them to your program, and make sure they are handled appropriately.
G'day,
A good place to start looking at this is David Wheeler's excellent secure coding site.
His free online book "Secure Programming for Linux and Unix HOWTO" is an excellent resource that is regularly updated.
You might also like to look at his excellent static analyser FlawFinder to get some further hints. But remember, no automated tool is a replacement for a good pair of experienced eyes, or as David so colourfully puts it..
Any static analysis tool, such as Flawfinder, is merely a tool. No tool can substitute for human thought! In short, "a fool with a tool is still a fool". It's a mistake to think that analysis tools (like flawfinder) are a substitute for security training and knowledge
I have personally used David's resources for several years now and find them to be excellent.
HTH
cheers,
Insecure Programming by Example
blog with a few of the answers
Yannick Moy developed a Hoare/Floyd weakest precondition system for C during his PhD and applied it to the CERT managed strings library. He found a number of bugs (see page 197 of his memoir). The good news is that the library is safer now for his work.
You could also look at Les Hatton's web site here and at his book Safer C which you can get from Amazon.
Don't use gets()
for input, use fgets()
. To use fgets()
, if your buffer is automatically allocated (i.e., "on the stack"), then use this idiom:
char buf[N];
...
if (fgets(buf, sizeof buf, fp) != NULL)
This will keep working if you decide to change the size of buf
. I prefer this form to:
#define N whatever
char buf[N];
if (fgets(buf, N, fp) != NULL)
because the first form uses buf
to determine the second argument, and is clearer.
Check the return value of fclose()
.