I'm using rvest to extract the table in the following page:
https://en.wikipedia.org/wiki/List_of_United_States_presidential_elections_by_popular_vote_margin
The following code works:
URL <- 'https://en.wikipedia.org/wiki/List_of_United_States_presidential_elections_by_popular_vote_margin'
table <- URL %>%
read_html %>%
html_nodes("table") %>%
.[[2]] %>%
html_table(trim=TRUE)
but the column of margins and president names have some strange values. The reason is that the source code have the following:
<td><span style="display:none">00.001</span>−10.44%</td>
so instead of getting -10.44% I get 00.001−10.44%
How could I fix this?
One option is to target and replace the problem columns individually.
The margin columns can be targeted with
xpath
Do the same for the other margin column. I used
iconv
to convert the−
to-
, as it's an encoding issue, but you could use a substitution based solution instead (e.g. usingsub
).To target column with president names, you can use xpath again: