Is this error an encoding error? How do I solve it

2020-04-16 01:32发布

I am doing web scraping.

Below is the code I used.

I wrote few comments on the comment.

library(httr)
library(rvest)
library(stringr)


# Bulletin board url
List.of.questions.url<- 'http://kin.naver.com/qna/list.nhn?m=noanswer&dirId=70108'

# Vector to store title and body
answers <- c()

#  get the posts from page 1 to page 2.
for(i in 1:2){
  url <- modify_url(List.of.questions.url, query=list(page=i))  
  list <- read_html(url, encoding = 'utf-8') #I think I encoded, but I'm getting an error.


  # Gets the url of the post.
  # TLS = title.links, CLS = content.links 
  TLS <- html_nodes(list, '.basic1 dt a') 
  CLS <- html_attr(TLS, 'href')
  CLS <- paste0("http://kin.naver.com",CLS) 

  #Gets the required properties.
  for(link in CLS){
    h <- read_html(link)  

    # answer    
    answer <- html_text(html_nodes(h, '#contents_layer_1'))
    answer <- str_trim(repair_encoding(answer)) #I think I encoded, but I'm getting an error.
    answers<-c(answers,answer)

    print(link)

  }
}

However, this error occurs while scraping.

Maybe it's about encoding.

(But as I wrote in the comments, I think I did the encoding properly.)

[1] "http://kin.naver.com/qna/detail.nhn?d1id=7&dirId=70111&docId=280474910"
Error: No guess has more than 50% confidence
In addition: There were 43 warnings (use warnings() to see them)  
> warnings()

1: In stringi::stri_conv(x, from = from) :
  the Unicode codepoint \U000000a0 cannot be converted to destination encoding
2: In stringi::stri_conv(x, from = from) :
  the Unicode codepoint \U000000a0 cannot be converted to destination encoding
3: In stringi::stri_conv(x, from = from) :
  the Unicode codepoint \U000000a0 cannot be converted to destination encoding
4: In stringi::stri_conv(x, from = from) :
  the Unicode codepoint \U000000a0 cannot be converted to destination encoding
5: In stringi::stri_conv(x, from = from) :  
#All the same contents, so omitted

How do I fix it?

Thank you for your advice

0条回答
登录 后发表回答