Encoding decoded urls in c++

2019-08-29 12:20发布

I want to decode encoded urls. As an example the letter ö is encoded as "%C3%B6" corresponding to its hexadecimal utf-8 encoding 0xc3b6 (50102).

In need to know now how to print this value as ö on the console or into a string buffer.

Simply casting to char, wchar_t, char16_t or char32_t and printing to cout or wcout didn't work.

The closest I have got was by using its utf-16 representation 0x00f6. The folowing code snippet prints ö

#include <codecvt>
#include <iostream>
#include <locale>

int main() {
  std::wstring_convert<std::codecvt_utf8<char16_t>, char16_t> convert;
  std::cout << convert.to_bytes(0x00f6) << '\n';
}

I need now either a way to calculate 0x00f6 from 0xc3b6 or another approach to decode the url.

2条回答
兄弟一词,经得起流年.
2楼-- · 2019-08-29 12:57

Thanks for all the help. Here is what I have come up with. Maybe it will help someone else

#include <iomanip>
#include <iostream>
#include <sstream>

#include <cstdint>

std::string encode_url(const std::string& s) {
  std::ostringstream oss;
  for (std::uint16_t c : s) {
    if (c > 0 && c < 128) {
      oss << static_cast<char>(c);
    }
    else {
      oss << '%' << std::uppercase << std::hex << (0x00ff & c);
    }
  }
  return std::move(oss).str();
} 

int parse_hex(const std::string& s) {
  std::istringstream iss(s);
  int n;
  iss >> std::uppercase >> std::hex >> n;
  return n;
}

std::string decode_url(const std::string& s) {
  std::string result;
  result.reserve(s.size());
  for (std::size_t i = 0; i < s.size();) {
    if (s[i] != '%') {
      result.push_back(s[i]);
      ++i;
    }
    else {
      result.push_back(parse_hex(s.substr(i + 1, 2)));
      i += 3;
    }
  }
  return result;
}

There is still room for optimizations but it works :)

查看更多
Emotional °昔
3楼-- · 2019-08-29 13:06

In POSIX you can print UTF8 string directly:

std::string utf8 = "\xc3\xb6"; // or just u8"ö"
printf(utf8);

In Windows, you have to convert to UTF16. Use wchar_t instead of char16_t, even though char16_t is supposed to be the right one. They are both 2 bytes per character in Windows.

You want convert.from_bytes to convert from UTF8, instead of convert.to_bytes which converts to UTF8.

Printing Unicode in Windows console is another headache. See relevant topics.

Note that std::wstring_convert is deprecated and has no replacement as of now.

#include <iostream>
#include <string>
#include <codecvt>
#include <windows.h>

int main() 
{
    std::string utf8 = "\xc3\xb6";

    std::wstring_convert<std::codecvt_utf8<wchar_t>, wchar_t> convert;
    std::wstring utf16 = convert.from_bytes(utf8);

    MessageBox(0, utf16.c_str(), 0, 0);
    DWORD count;
    WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), utf16.c_str(), utf16.size(), &count, 0);

    return 0;
}

Encoding/Decoding URL

"URL safe characters" don't need encoding. All other characters, including non-ASCII characters, should be encoded. Example:

std::string encode_url(const std::string& s)
{
    const std::string safe_characters = 
        "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._~";
    std::ostringstream oss;
    for(auto c : s) {
        if (safe_characters.find(c) != std::string::npos)
            oss << c;
        else
            oss << '%' << std::setfill('0') << std::setw(2) << 
                std::uppercase << std::hex << (0xff & c);
    }
    return oss.str();
}

std::string decode_url(const std::string& s) 
{
    std::string result;
    for(std::size_t i = 0; i < s.size(); i++) {
        if(s[i] == '%') {
            try { 
                auto v = std::stoi(s.substr(i + 1, 2), nullptr, 16);
                result.push_back(0xff & v);
            } catch(...) { } //handle error
            i += 2;
        }
        else {
            result.push_back(s[i]);
        }

    }
    return result;
}
查看更多
登录 后发表回答