Automatic UTF-8 encoding in Node.js HTTP client

2019-05-16 11:28发布

问题:

There I am trying to load XML content from a remote host using Node.js.

The problem is that German "umlaute" like "ä" are broken. Like in the browser this usually is a simple encoding problem. But since the XML content on the remote host is encoded in iso-8859-2" I had no success getting the letters back to work.

The functionality is very simple. I simply use the default HTTP client integrated in Node.js to connect to a remote host with a simple get request.

Some environment facts:

  • The remote system uses "iso-8859-2" encoding.
  • The encoding is currently set in the response header.
  • The characters are unrecoverable broken in the data (chunk) received by response.onData(chunk)

Node.js is running on version 0.2 on da default Debian server.

The code is based on the default httpClient like described in the Node.js documentation.

I tried the following:

response.defaultAsciiEncoding true/false
response.encoding = UFT-8/ascii

I used a UTF-8 encoder/decoder to encode/decode the chunk. After this failed I tried to encode/decode the whole response body.

I am not very familiar with using buffers, and I guess the problem must be in that direction. Or Node.js (or the httpClient) simply can't handle other encoding types by default witch is my second guess. In this case I need to write my own HTTP client using the net lib I think. I just want to make sure I don't walk into the wrong direction :)

回答1:

I had a quick poke around the Node.js source and it seems like svick is right: Node.js doesn't support the ISO encoding. You can, however, get at the response as a binary stream and then either return it to the browser with your own encoding or use node-iconv (again as svick suggested).

Here's a little example: http://gist.github.com/576884



回答2:

Try setting the encoding parameter in the XML declaration:

<?xml version="1.0" encoding="iso-8859-2" ?>
<xml>
  <!-- whatever -->
</xml>

XML files default to UTF-8 unless you explicitly declare their encoding.



回答3:

It seems to me that Node.js can't work with encoding other than UTF-8. Maybe using something like node-iconv should work.