As of C++14, thanks to n3781 (which in itself does not answer this question) we may write code like the following:
const int x = 1'234; // one thousand two hundred and thirty four
The aim is to improve on code like this:
const int y = 100000000;
and make it more readable.
The underscore (_
) character was already taken in C++11 by user-defined literals, and the comma (,
) has localisation problems — many European countries bafflingly† use this as the decimal separator — and conflicts with the comma operator, though I do wonder what real-world code could possibly have been broken by allowing e.g. 1,234,567
.
Anyway, a better solution would seem to be the space character:
const int z = 1 000 000;
These adjacent numeric literal tokens could be concatenated by the preprocessor just as are string literals:
const char x[5] = "a" "bc" "d";
Instead, we get the apostrophe ('
), not used by any writing system I'm aware of as a digit separator.
Is there a reason that the apostrophe was chosen instead of a simple space?
† It's baffling because all of those languages, within text, maintain the notion of a comma "breaking apart" an otherwise atomic sentence, with a period functioning to "terminate" the sentence — to me, at least, this is quite analogous to a comma "breaking apart" the integral part of a number and a period "terminating" it ready for the fractional input.
From wiki, we have a nice example:
Here, we have the
.
operator and then if another operator would be to be met, my eyes would wait for something visible, like a comma or something, not a whitespace.So an apostrophe does much better here than a whitespace would do.
With whitespaces it would be
which doesn't feel as right as the case with the apostrophes.
In the same spirit of Albert Renshaw's answer, I think that the apostrophe is more clear than the space the Lightness Races in Orbit proposes.
Space is used for many things, like the strings concatenation the OP mentions, unlike the apostrophe, which in this case makes it clear for someone that is used separating the digits.
When the lines of code become many, I think that this will improve readability, but I doubt that is the reason they choose it.
About the spaces, it might worth taking a look at this C question, which says:
The language doesn't allow
int i = 10 000;
(an integer literal is one token, the intervening whitespace splits it into two tokens) but there's typically little to no expense incurred by expressing the initializer as an expression that is a calculation of literals:int i = 10 * 1000; /* ten thousand */
It is true I see no practical meaning to:
so digits might be merged without real ambiguity but what about an hexadecimal number?
There is no way to disambiguate from a typo doing so (normally we should see an error)
Commenting does not hurt:
Binary strings can be hard to parse:
A macro for consideration:
The obvious reason for not using white space is that a new line is also white space, and that C++ treats all white space identically. And off hand, I don't know of any language which accepts arbitrary white space as a separator.
Presumably, Unicode 0xA0 (non-breaking space) could be used—it is the most widely used solution when typesetting. I see two problems with that, however: first, it's not in the basic character set, and second, it's not visually distinctive; you can't see that it isn't a space by just looking at the text in a normal editor.
Beyond that, there aren't many choices. You can't use the comma, since that is already a legal token (and something like
1,234
is currently legal C++, with the meaning 234). And in a context where it could occur in legal code, e.g.a[1,234]
. While I can't quite imagine any real code actually using this, there is a basic rule that no legal program, regardless how absurd, should silently change semantics.Similar considerations mean that
_
can't be used either; if there is a#define _234 * 2
, thena[1_234]
would silently change the meaning of the code.I can't say that I'm particularly pleased with the choice of
'
, but it does have the advantage of being used in continental Europe, at least in some types of texts. (I seem to remember having seen it in German, for example, although in typical running text, German, like most other languages, will use a point or a non breaking space. But maybe it was Swiss German.) The problem with'
is parsing; the sequence'1'
is already legal, as is'123'
. So something like1'234
could be a1
, followed by the start of a character constant; I'm not sure how far you have to look-ahead to make the decision. There is no sequence of legal C++ in which an integral constant can be followed by a character constant, so there's no problem with breaking legal code, but it means that lexical scanning suddenly becomes very context dependent.(With regards to your comment: there is no logic in the choice of a decimal or a thousands separator. A decimal separator, for example, is certainly not a full stop. They are just arbitrary conventions.)
There is a previous paper, n3499, which tell us that although Bjarne himself suggested spaces as separators:
I guess the following example is the main problem noted:
though in my opinion this rationale is fairly weak. I still can't think of a real-world example to break it.
The "editing tools" rationale is even worse, since
1'234
breaks basically every syntax highlighter known to mankind (e.g. that used by Markdown in the above question itself!) and makes updated versions of said highlighters much harder to implement.Still, for better or worse, this is the rationale that led to the adoption of apostrophes instead.
I would assume it's because, while writing code, if you reach the end of a "line" (the width of your screen) an automatic line-break (or "word wrap") occurs. This would cause your int to get split in half, one half of it would be on the first line, the second half on the second... this way it all stays together in the event of a
word-wrap
.