I use curl
to get some URL response, it's JSON response and it contains unicode-escaped national characters like \u0144 (ń)
and \u00f3 (ó)
.
How can I convert them to UTF-8 or any other encoding to save into file?
I use curl
to get some URL response, it's JSON response and it contains unicode-escaped national characters like \u0144 (ń)
and \u00f3 (ó)
.
How can I convert them to UTF-8 or any other encoding to save into file?
Works on Windows, should work on *nix too. Uses python 2.
Don't rely on regexes: JSON has some strange corner-cases with
\u
escapes and non-BMP code points. (specifically, JSON will encode one code-point using two\u
escapes) If you assume 1 escape sequence translates to 1 code point, you're doomed on such text.Using a full JSON parser from the language of your choice is considerably more robust:
That's really just feeding the data to this short python script:
From which you can save as
foo.py
and call ascurl ... | foo.py
An example that will break most of the other attempts in this question is
"\ud83d\udca3"
:I found native2ascii from JDK as the best way to do it:
Detailed description is here: http://docs.oracle.com/javase/1.5.0/docs/tooldocs/windows/native2ascii.html
Update: No longer available since JDK9: https://bugs.openjdk.java.net/browse/JDK-8074431
I don't know which distribution you are using, but uni2ascii should be included.
It only depend on libc6, so it's a lightweight solution (uni2ascii i386 4.18-2 is 55,0 kB on Ubuntu)!
Then to use it:
Might be a bit ugly, but
echo -e
should do it:-e
interprets escapes,-n
suppresses the newlineecho
would normally add.Note: The
\u
escape works in the bash builtinecho
, but not/usr/bin/echo
.As pointed out in the comments, this is bash 4.2+, and 4.2.x have a bug handling 0x00ff/17 values (0x80-0xff).