Can't use jsonlite in R to read json format fi

2019-08-18 23:42发布

问题:

I can't use R to read the .json file, but I can see it on the web site.

Below is the site of data↓

https://data.kcg.gov.tw/dataset/7999ac19-e7dc-496a-9b7d-bd8daec107bd/resource/19d06299-a80c-42c2-a9b8-63d4466161a0/download/priceshistory20160101-20161231.json

Here is my code.

library(jsonlite)
link <- "https://data.kcg.gov.tw/dataset/7999ac19-e7dc-496a-9b7d-bd8daec107bd/resource/19d06299-a80c-42c2-a9b8-63d4466161a0/download/priceshistory_20160101-20161231.json"
kh <- fromJSON(link)

Error in open.connection(con, "rb") : Couldn't connect to server

Any help will be thankful.

> sessionInfo()
R version 3.3.1 (2016-06-21)
latform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

回答1:

Your main error is very likely the firewall issue others have pointed out. You may be able to use httr to triage better:

library(httr)
library(jsonlite)

link <- "https://data.kcg.gov.tw/dataset/7999ac19-e7dc-496a-9b7d-bd8daec107bd/resource/19d06299-a80c-42c2-a9b8-63d4466161a0/download/priceshistory_20160101-20161231.json"

The connection, here, worked for me but the data has some issues (which is the main reason I posted this answer):

kh <- jsonlite::fromJSON(json_url)
## Error in parse_con(txt, bigint_as_char) : 
##   lexical error: invalid char in json text.
##                                        [   {     "result":{       "
##                      (right here) ------^
## In addition: Warning message:
## JSON string contains (illegal) UTF8 byte-order-mark! 

That error means the BOM wasn't removed (we'll have to do that, then).

Here's a way you can triage the connection a bit using httr::GET():

httr::GET(
  link, 
  progress(), # it's a 13MB file on a slow connection for North America, so this helps
  verbose()   # this lets you see the connection info to make sure nothing is wrong
) -> res

This had no errors so I'm not pasting the verbose output, but you should look at the verbose output and see what HTTP errors show up. That may help diagnose any proxy/firewall issues. Using the latest curl and httr packages may also help get through this as they play nicer with Windows OS now.

Back to the BOM issue, which is still likely going to be an issue for you:

hk_raw <- httr::content(res, as="raw")

hk_raw[1:10]
## [1] ef bb bf ef bb bf 5b 0a 20 20

I'm not sure why the UTF-8 BOM sequence is there 2x, but that's easy to deal with (and will need to be dealt with)

hk <- jsonlite::fromJSON(rawToChar(hk_raw[-(1:6)]))

That should give you the data structure fully read in.



标签: r jsonlite