I had a lovely conversation with someone about the downfalls of std::stoi
. To put it bluntly, it uses std::strtol
internally, and throws if that reports an error. According to them, though, std::strtol
shouldn't report an error for an input of "abcxyz"
, causing stoi
not to throw std::invalid_argument
.
First of all, here are two programs tested on GCC about the behaviours of these cases:
strtol
stoi
Both of them show success on "123"
and failure on "abc"
.
I looked in the standard to pull more info:
§ 21.5
Throws: invalid_argument if strtol, strtoul, strtoll, or strtoull reports that
no conversion could be performed. Throws out_of_range if the converted value is
outside the range of representable values for the return type.
That sums up the behaviour of relying on strtol
. Now what about strtol
? I found this in the C11 draft:
§7.22.1.4
If the subject sequence is empty or does not have the expected form, no
conversion is performed; the value of nptr is stored in the object
pointed to by endptr, provided that endptr is not a null pointer.
Given the situation of passing in "abc"
, the C standard dictates that nptr
, which points to the beginning of the string, would be stored in endptr
, the pointer passed in. This seems consistent with the test. Also, 0 should be returned, as stated by this:
§7.22.1.4
If no conversion could be performed, zero is returned.
The previous reference said that no conversion would be performed, so it must return 0. These conditions now comply with the C++11 standard for stoi
throwing std::invalid_argument
.
The result of this matters to me because I don't want to go around recommending stoi
as a better alternative to other methods of string to int conversion, or using it myself as if it worked the way you'd expect, if it doesn't catch text as an invalid conversion.
So after all of this, did I go wrong somewhere? It seems to me that I have good proof of this exception being thrown. Is my proof valid, or is std::stoi
not guaranteed to throw that exception when given "abc"
?
Does std::stoi
throw an error on the input "abcxyz"
?
Yes.
I think your confusion may come from the fact that strtol
never reports an error except on overflow. It can report that no conversion was performed, but this is never referred to as an error condition in the C standard.
strtol
is defined similarly by all three C standards, and I will spare you the boring details, but it basically defines a "subject sequence" that is a substring of the input string corresponding to the actual number. The following four conditions are equivalent:
- the subject sequence has the expected form (in plain English: it is a number)
- the subject sequence is non-empty
- a conversion has occurred
*endptr != nptr
(this only makes sense when endptr
is non-null)
When there is an overflow, the conversion is still said to have occurred.
Now, it is quite clear that because "abcxyz"
does not contain a number, the subject sequence of the string "abcxyz"
must be empty, so that no conversion can be performed. The following C90/C99/C11 program will confirm it experimentally:
#include <stdio.h>
#include <stdlib.h>
int main() {
char *nptr = "abcxyz", *endptr[1];
strtol(nptr, endptr, 0);
if (*endptr == nptr)
printf("No conversion could be performed.\n");
return 0;
}
This implies that any conformant implementation of std::stoi
must throw invalid_argument
when given the input "abcxyz"
without an optional base argument.
Does this mean that std::stoi
has satisfactory error checking?
No. The person you were talking to is correct when she says that std::stoi
is more lenient than performing the full check errno == 0 && end != start && *end=='\0'
after std::strtol
, because std::stoi
silently strips away all characters starting from the first non-numeric character in the string.
In fact off the top of my head the only language whose native conversion behaves somewhat like std::stoi
is Javascript, and even then you have to force base 10 with parseInt(n, 10)
to avoid the special case of hexadecimal numbers:
input | std::atoi std::stoi Javascript full check
===========+=============================================================
hello | 0 error error(NaN) error
0xygen | 0 0 error(NaN) error
0x42 | 0 0 66 error
42x0 | 42 42 42 error
42 | 42 42 42 42
-----------+-------------------------------------------------------------
languages | Perl, Ruby, Javascript Javascript C#, Java,
| PHP, C... (base 10) Python...
Note: there are also differences among languages in the handling of whitespace and redundant + signs.
Ok, so I want full error checking, what should I use?
I'm not aware of any built-in function that does this, but boost::lexical_cast<int>
will do what you want. It is particularly strict since it even rejects surrounding whitespace, unlike Python's int()
function. Note that invalid characters and overflows result in the same exception, boost::bad_lexical_cast
.
#include <boost/lexical_cast.hpp>
int main() {
std::string s = "42";
try {
int n = boost::lexical_cast<int>(s);
std::cout << "n = " << n << std::endl;
} catch (boost::bad_lexical_cast) {
std::cout << "conversion failed" << std::endl;
}
}