lexical error: invalid bytes in UTF8 string

2020-05-01 08:45发布

问题:

I am trying to use the code shown below to extract data from a json file. However, the following error is returned:

Error: lexical error: invalid bytes in UTF8 string.
          fr":"Ces données sont publiées avec un délai de cinq jours
                     (right here) ------^

Inspecting the json file in my browser shows that the data appears as such:

"fr":"Ces donn\u00e9es sont publi�es avec un d\u00e9lai de cinq jours."

Is there a way to write the data while ignoring any UTF8 strings that cause an error?

library(jsonlite)

URL <- paste0("https://www.energy-charts.de/power_unit/month_lignite_unit_2017_12.json")

data <- fromJSON(getURL(URL))

回答1:

The problem is that the URL returns data in a latin1 encoding, and your system is defaulting to reading it as UTF-8. You can get it correctly using

library(jsonlite)
library(RCurl)  

URL <- "https://www.energy-charts.de/power_unit/month_lignite_unit_2017_12.json"

data <- fromJSON(getURL(URL, encoding = "latin1"))

I've also corrected some minor errors in your code: you forgot to request RCurl, and paste0 was not needed.



标签: r jsonlite