Automatic UTF-8 encoding in Node.js HTTP client

2019-05-16 10:59发布

There I am trying to load XML content from a remote host using Node.js.

The problem is that German "umlaute" like "ä" are broken. Like in the browser this usually is a simple encoding problem. But since the XML content on the remote host is encoded in iso-8859-2" I had no success getting the letters back to work.

The functionality is very simple. I simply use the default HTTP client integrated in Node.js to connect to a remote host with a simple get request.

Some environment facts:

  • The remote system uses "iso-8859-2" encoding.
  • The encoding is currently set in the response header.
  • The characters are unrecoverable broken in the data (chunk) received by response.onData(chunk)

Node.js is running on version 0.2 on da default Debian server.

The code is based on the default httpClient like described in the Node.js documentation.

I tried the following:

response.defaultAsciiEncoding true/false
response.encoding = UFT-8/ascii

I used a UTF-8 encoder/decoder to encode/decode the chunk. After this failed I tried to encode/decode the whole response body.

I am not very familiar with using buffers, and I guess the problem must be in that direction. Or Node.js (or the httpClient) simply can't handle other encoding types by default witch is my second guess. In this case I need to write my own HTTP client using the net lib I think. I just want to make sure I don't walk into the wrong direction :)

3条回答
仙女界的扛把子
2楼-- · 2019-05-16 11:14

I had a quick poke around the Node.js source and it seems like svick is right: Node.js doesn't support the ISO encoding. You can, however, get at the response as a binary stream and then either return it to the browser with your own encoding or use node-iconv (again as svick suggested).

Here's a little example: http://gist.github.com/576884

查看更多
等我变得足够好
3楼-- · 2019-05-16 11:16

Try setting the encoding parameter in the XML declaration:

<?xml version="1.0" encoding="iso-8859-2" ?>
<xml>
  <!-- whatever -->
</xml>

XML files default to UTF-8 unless you explicitly declare their encoding.

查看更多
\"骚年 ilove
4楼-- · 2019-05-16 11:29

It seems to me that Node.js can't work with encoding other than UTF-8. Maybe using something like node-iconv should work.

查看更多
登录 后发表回答