I want to decode encoded urls. As an example the letter ö is encoded as "%C3%B6"
corresponding to its hexadecimal utf-8 encoding 0xc3b6
(50102).
In need to know now how to print this value as ö on the console or into a string buffer.
Simply casting to char, wchar_t, char16_t or char32_t and printing to cout or wcout didn't work.
The closest I have got was by using its utf-16 representation 0x00f6
. The folowing code snippet prints ö
#include <codecvt>
#include <iostream>
#include <locale>
int main() {
std::wstring_convert<std::codecvt_utf8<char16_t>, char16_t> convert;
std::cout << convert.to_bytes(0x00f6) << '\n';
}
I need now either a way to calculate 0x00f6
from 0xc3b6
or another approach to decode the url.
Thanks for all the help. Here is what I have come up with. Maybe it will help someone else
There is still room for optimizations but it works :)
In POSIX you can print UTF8 string directly:
In Windows, you have to convert to UTF16. Use
wchar_t
instead ofchar16_t
, even thoughchar16_t
is supposed to be the right one. They are both 2 bytes per character in Windows.You want
convert.from_bytes
to convert from UTF8, instead ofconvert.to_bytes
which converts to UTF8.Printing Unicode in Windows console is another headache. See relevant topics.
Note that
std::wstring_convert
is deprecated and has no replacement as of now.Encoding/Decoding URL
"URL safe characters" don't need encoding. All other characters, including non-ASCII characters, should be encoded. Example: