I am doing web scraping.
Below is the code I used.
I wrote few comments on the comment.
library(httr)
library(rvest)
library(stringr)
# Bulletin board url
List.of.questions.url<- 'http://kin.naver.com/qna/list.nhn?m=noanswer&dirId=70108'
# Vector to store title and body
answers <- c()
# get the posts from page 1 to page 2.
for(i in 1:2){
url <- modify_url(List.of.questions.url, query=list(page=i))
list <- read_html(url, encoding = 'utf-8') #I think I encoded, but I'm getting an error.
# Gets the url of the post.
# TLS = title.links, CLS = content.links
TLS <- html_nodes(list, '.basic1 dt a')
CLS <- html_attr(TLS, 'href')
CLS <- paste0("http://kin.naver.com",CLS)
#Gets the required properties.
for(link in CLS){
h <- read_html(link)
# answer
answer <- html_text(html_nodes(h, '#contents_layer_1'))
answer <- str_trim(repair_encoding(answer)) #I think I encoded, but I'm getting an error.
answers<-c(answers,answer)
print(link)
}
}
However, this error occurs while scraping.
Maybe it's about encoding.
(But as I wrote in the comments, I think I did the encoding properly.)
[1] "http://kin.naver.com/qna/detail.nhn?d1id=7&dirId=70111&docId=280474910"
Error: No guess has more than 50% confidence
In addition: There were 43 warnings (use warnings() to see them)
> warnings()
1: In stringi::stri_conv(x, from = from) :
the Unicode codepoint \U000000a0 cannot be converted to destination encoding
2: In stringi::stri_conv(x, from = from) :
the Unicode codepoint \U000000a0 cannot be converted to destination encoding
3: In stringi::stri_conv(x, from = from) :
the Unicode codepoint \U000000a0 cannot be converted to destination encoding
4: In stringi::stri_conv(x, from = from) :
the Unicode codepoint \U000000a0 cannot be converted to destination encoding
5: In stringi::stri_conv(x, from = from) :
#All the same contents, so omitted
How do I fix it?
Thank you for your advice