I was solving some problem on codeforces. Normally I first check if the character is upper or lower English letter then subtract or add 32
to convert it to the corresponding letter. But I found someone do ^= 32
to do the same thing. Here it is:
char foo = 'a';
foo ^= 32;
char bar = 'A';
bar ^= 32;
cout << foo << ' ' << bar << '\n'; // foo is A, and bar is a
I have searched for an explanation for this and didn't find out. So why this works?
Let's take a look at ASCII code table in binary.
And 32 is
0100000
which is the only difference between lowercase and uppercase letters. So toggling that bit toggles the case of a letter.See the second table at http://www.catb.org/esr/faqs/things-every-hacker-once-knew/#_ascii, and following notes, reproduced below:
ASCII was designed such that the shift and ctrl keyboard keys could be implemented without much (or perhaps any for ctrl) logic - shift probably required only a few gates. It probably made at least as much sense to store the wire protocol as any other character encoding (no software conversion required).
The linked article also explains many strange hacker conventions such as
And control H does a single character and is an old^H^H^H^H^H classic joke.
(found here).It works because, as it happens, the difference between 'a' and A' in ASCII and derived encodings is 32, and 32 is also the value of the sixth bit. Flipping the 6th bit with an exclusive OR thus converts between upper and lower.
Most likely your implementation of the character set will be ASCII. If we look at the table:
We see that there's a difference of exactly
32
between the value of a lowercase and uppercase number. Therefore, if we do^= 32
(which equates to toggling the 6th least significant bit), it changes between a lowercase and uppercase character.Note that it works with all the symbols, not just the letters. It toggles a character with the respective character where the 6th bit is different, resulting in a pair of characters that is toggled back and forth between. For the letters, the respective upper/lowercase characters form such a pair. A
NUL
will change intoSpace
and the other way around, and the@
toggles with the backtick. Basically any character in the first column on this chart toggles with the character one column over, and the same applies to the third and fourth columns.I wouldn't use this hack though, as there's not guarantee that it's going to work on any system. Just use toupper and tolower instead, and queries such as isupper.
This uses the fact than ASCII values have been chosen by really smart people.
This flips the 6th lowest bit1 of
foo
(the uppercase flag of ASCII sort of), transforming an ASCII upper case to a lower case and vice-versa.Example
And by property of XOR,
'a' ^ 32 == 'A'
.Notice
C++ is not required to use ASCII to represent characters. Another variant is EBCDIC. This trick only works on ASCII platforms. A more portable solution would be to use
std::tolower
andstd::toupper
, with the offered bonus to be locale-aware (it does not automagically solve all your problems though, see comments):1) As 32 is
1 << 5
(2 to the power 5), it flips the 6th bit (counting from 1).Plenty of good answers here that describe how this works, but why it works this way is to improve performance. Bitwise operations are faster than most other operations within a processor. You can quickly do a case insensitive comparison by simply not looking at the bit that determines case or change case to upper/lower simply by flipping the bit (those guys that designed the ASCII table were pretty smart).
Obviously, this isn't nearly as big of a deal today as it was back in 1960 (when work first began on ASCII) due to faster processors and Unicode, but there are still some low-cost processors that this could make a significant difference as long as you can guarantee only ASCII characters.
https://en.wikipedia.org/wiki/Bitwise_operation
NOTE: I would recommend using standard libraries for working with strings for a number of reasons (readability, correctness, portability, etc). Only use bit flipping if you have measured performance and this is your bottleneck.