how to get big5 urlencode in node.js?

2019-05-30 10:02发布

问题:

I want to use nodejs to encode the char '十'(\u5341) to big5 '%A4Q', but I don't know how to do it. I need help.

More detail, bellow is a html file names test.html:

<!DOCTYPE html>
<html>
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=big5">
    <title>test</title>
</head>
<body>
    <form>
        <input name="a"/>
        <input type="submit">
    </form>
</body>
</html>

And open this file in Chrome, type '十' and click 'Submit', you can see the url in the address bar is 'http://localhost/test.html?a=%A4Q'.

I just want to use nodejs to convert url same as Chrome(and other browsers). I tried to use iconv-lite or node-iconv, but can not convert '十' to '%A4Q'


Use iconv-lite and node-iconv I got different result. Code is :

var iconv = require('iconv-lite');
var Iconv = require('iconv').Iconv;
var iconv2 = new Iconv('utf8', 'BIG5');

function format(buf) {
  var rtn = "";
  for(var i=0;i<buf.length;i++) {
      rtn += "%" + buf[i].toString(16);
  }
  return rtn;
}

var chr = '十';
console.log(format(iconv.encode(chr, 'big5')));
console.log(format(iconv2.convert(chr)));

result is:

%a2%cc
%a4%51

even I use Java: System.out.println(URLEncoder.encode("十", "Big5")); I also get '%A4%51'.

Here is a relevant question:URL Decode Difference between C# and Java

回答1:

because %51 is char 'Q' in big5, so '%A4Q' is equal to '%A4%51', the urlencode parse it.

what's more, the 'A' in '%A4Q' is case-insensitive, while the 'Q' is not, because 'Q' and 'q' is defferent(%51 and %71)



回答2:

Based on @user1783292's answer above, I write the code bellow.

var Iconv = require('iconv').Iconv;
var iconv = new Iconv('utf8', 'BIG5');

function big5_encode(chr) {
    var rtn = "";
    var buf = iconv.convert(chr);
    for(var i=0;i<buf.length;i+=2) {
        rtn += '%' + buf[i].toString(16).toUpperCase();
        rtn += ((buf[i+1] >= 65 && buf[i+1] <= 90)
            ||(buf[i+1]>=97 && buf[i+1]<=122))
            ? String.fromCharCode(buf[i+1])
            : '%' + buf[i+1].toString(16).toUpperCase();
    }
    return rtn;
}

var chr = '十尢我';
console.log(big5_encode(chr));

the output is %A4Q%A4q%A7%DA, same as Chrome.

Maybe there is some standard rule about big5 url encode, but I do not find it. And Java's URLDecoder may also ignore such rules(so it's not correct).



回答3:

I believe someone might need decode function.lol

function big5_urldecode(str){
  var tokens = str.split("%").slice(1);
  var chars = [];
  tokens.forEach((token)=>{
    chars.push(parseInt(token.substring(0,2),16));
    if(token.length > 2){
      chars.push(token.charCodeAt(2));
    }
  });
  return chars;
}