I want to know the disadvantages of scanf()
.
In many sites, I have read that using scanf
might cause buffer overflows. What is the reason for this? Are there any other drawbacks with scanf
?
I want to know the disadvantages of scanf()
.
In many sites, I have read that using scanf
might cause buffer overflows. What is the reason for this? Are there any other drawbacks with scanf
?
Most of the answers so far seem to focus on the string buffer overflow issue. In reality, the format specifiers that can be used with
scanf
functions support explicit field width setting, which limit the maximum size of the input and prevent buffer overflow. This renders the popular accusations of string-buffer overflow dangers present inscanf
virtually baseless. Claiming thatscanf
is somehow analogous togets
in the respect is completely incorrect. There's a major qualitative difference betweenscanf
andgets
:scanf
does provide the user with string-buffer-overflow-preventing features, whilegets
doesn't.One can argue that these
scanf
features are difficult to use, since the field width has to be embedded into format string (there's no way to pass it through a variadic argument, as it can be done inprintf
). That is actually true.scanf
is indeed rather poorly designed in that regard. But nevertheless any claims thatscanf
is somehow hopelessly broken with regard to string-buffer-overflow safety are completely bogus and usually made by lazy programmers.The real problem with
scanf
has a completely different nature, even though it is also about overflow. Whenscanf
function is used for converting decimal representations of numbers into values of arithmetic types, it provides no protection from arithmetic overflow. If overflow happens,scanf
produces undefined behavior. For this reason, the only proper way to perform the conversion in C standard library is functions fromstrto...
family.So, to summarize the above, the problem with
scanf
is that it is difficult (albeit possible) to use properly and safely with string buffers. And it is impossible to use safely for arithmetic input. The latter is the real problem. The former is just an inconvenience.P.S. The above in intended to be about the entire family of
scanf
functions (including alsofscanf
andsscanf
). Withscanf
specifically, the obvious issue is that the very idea of using a strictly-formatted function for reading potentially interactive input is rather questionable.From the comp.lang.c FAQ: Why does everyone say not to use scanf? What should I use instead?
There is one big problem with
scanf
-like functions - the lack of any type safety. That is, you can code this:Hell, even this is "fine":
It's worse than
printf
-like functions, becausescanf
expects a pointer, so crashes are more likely.Sure, there are some format-specifier checkers out there, but, those are not perfect and well, they are not part of the language or the standard library.
It is very hard to get
scanf
to do the thing you want. Sure, you can, but things likescanf("%s", buf);
are as dangerous asgets(buf);
, as everyone has said.As an example, what paxdiablo is doing in his function to read can be done with something like:
The above will read a line, store the first 10 non-newline characters in
buf
, and then discard everything till (and including) a newline. So, paxdiablo's function could be written usingscanf
the following way:One of the other problems with
scanf
is its behavior in case of overflow. For example, when reading anint
:the above cannot be used safely in case of an overflow. Even for the first case, reading a string is much more simpler to do with
fgets
rather than withscanf
.The advantage of
scanf
is once you learn how use the tool, as you should always do in C, it has immensely useful usecases. You can learn how to usescanf
and friends by reading and understanding the manual. If you can't get through that manual without serious comprehension issues, this would probably indicate that you don't know C very well.scanf
and friends suffered from unfortunate design choices that rendered it difficult (and occasionally impossible) to use correctly without reading the documentation, as other answers have shown. This occurs throughout C, unfortunately, so if I were to advise against usingscanf
then I would probably advise against using C.One of the biggest disadvantages seems to be purely the reputation it's earned amongst the uninitiated; as with many useful features of C we should be well informed before we use it. The key is to realise that as with the rest of C, it seems succinct and idiomatic, but that can be subtly misleading. This is pervasive in C; it's easy for beginners to write code that they think makes sense and might even work for them initially, but doesn't make sense and can fail catastrophically.
For example, the uninitiated commonly expect that the
%s
delegate would cause a line to be read, and while that might seem intuitive it isn't necessarily true. It's more appropriate to describe the field read as a word. Reading the manual is strongly advised for every function.What would any response to this question be without mentioning its lack of safety and risk of buffer overflows? As we've already covered, C isn't a safe language, and will allow us to cut corners, possibly to apply an optimisation at the expense of correctness or more likely because we're lazy programmers. Thus, when we know the system will never receive a string larger than a fixed number of bytes, we're given the ability to declare an array that size and forego bounds checking. I don't really see this as a down-fall; it's an option. Again, reading the manual is strongly advised and would reveal this option to us.
Lazy programmers aren't the only ones stung by
scanf
. It's not uncommon to see people trying to readfloat
ordouble
values using%d
, for example. They're usually mistaken in believing that the implementation will perform some kind of conversion behind the scenes, which would make sense because similar conversions happen throughout the rest of the language, but that's not the case here. As I said earlier,scanf
and friends (and indeed the rest of C) are deceptive; they seem succinct and idiomatic but they aren't.Inexperienced programmers aren't forced to consider the success of the operation. Suppose the user enters something entirely non-numeric when we've told
scanf
to read and convert a sequence of decimal digits using%d
. The only way we can intercept such erroneous data is to check the return value, and how often do we bother checking the return value?Much like
fgets
, whenscanf
and friends fail to read what they're told to read, the stream will be left in an unusual state; - In the case offgets
, if there isn't sufficient space to store a complete line, then the remainder of the line left unread might be erroneously treated as though it's a new line when it isn't. - In the case ofscanf
and friends, a conversion failed as documented above, the erroneous data is left unread on the stream and might be erroneously treated as though it's part of a different field.It's no easier to use
scanf
and friends than to usefgets
. If we check for success by looking for a'\n'
when we're usingfgets
or by inspecting the return value when we usescanf
and friends, and we find that we've read an incomplete line usingfgets
or failed to read a field usingscanf
, then we're faced with the same reality: We're likely to discard input (usually up until and including the next newline)! Yuuuuuuck!Unfortunately,
scanf
both simultaneously makes it hard (non-intuitive) and easy (fewest keystrokes) to discard input in this way. Faced with this reality of discarding user input, some have tried, not realising that thescanf("%*[^\n]%*c");
%*[^\n]
delegate will fail when it encounters nothing but a newline, and hence the newline will still be left on the stream.A slight adaptation, by separating the two format delegates and we see some success here:
scanf("%*[^\n]"); getchar();
. Try doing that with so few keystrokes using some other tool ;)Many answers here discuss the potential overflow issues of using
scanf("%s", buf)
, but the latest POSIX specification more-or-less resolves this issue by providing anm
assignment-allocation character that can be used in format specifiers forc
,s
, and[
formats. This will allowscanf
to allocate as much memory as necessary withmalloc
(so it must be freed later withfree
).An example of its use:
See here. Disadvantages to this approach is that it is a relatively recent addition to the POSIX specification and it is not specified in the C specification at all, so it remains rather unportable for now.