I am using R's RJSONIO to read json from a file. The json contains unicode characters, which get read incorrectly.
The code works when the json is passed as string as shown by the author of the R package in the question on stackoverflow How to correctly deal with escaped Unicode Characters in R e.g. the em dash (—).
However when the json is read from a file, it does not produce the correct unicode representation. As seen below:
fromJSON(content="~/MTS/temp")
$query
$query$categorymembers
$query$categorymembers[[1]]
$query$categorymembers[[1]]$ns
[1] 0
$query$categorymembers[[1]]$title
[1] "Banach\023Tarski paradox"
Where ~/MTS/temp contains:
{"query":{"categorymembers":[{"ns":0,"title":"Banach\u2013Tarski paradox"}]}}`
An alternative package called jsonlite
works the way you would expect on my system (OS X) -- but I did verify that RJSONIO does not. This is after I saved your JSON snippet to a file called utext.txt
:
file.show("utext.txt")
## {"query":{"categorymembers":[{"ns":0,"title":"Banach\u2013Tarski paradox"}]}}
jsonlite::fromJSON("~/temp/utext.txt")
## $query
## $query$categorymembers
## ns title
## 1 0 Banach–Tarski paradox
Here is another solution that is a bit more platform-dependent: Encode your Unicode escaped files prior to reading them. (Whether or not your platform has this utility, I do not know, but even for Windows you can probably find it.)
My system locale encoding is UTF-8 (OS X standard), so when I run the command line utility native2ascii
I can encode it as UTF-8, and then read it into R, where my locale is set to en_GB.UTF-8.
From a Terminal/shell:
native2ascii -reverse ~/temp/utext.txt ~/temp/utextUTF8.txt
Then in R:
RJSONIO::fromJSON("~/temp/utextUTF8.txt")
## $query
## $query$categorymembers
## $query$categorymembers[[1]]
## $query$categorymembers[[1]]$ns
## [1] 0
##
## $query$categorymembers[[1]]$title
## [1] "Banach–Tarski paradox"
Voil\u00e0 problem solved.