I want to substitute whitespaces with NA. A simple way could be df[df == ""] <- NA
, and that works for most of the cells of my data frame....but not for everyone!
I have the following code:
library(rvest)
library(dplyr)
library(tidyr)
#Read website
htmlpage <- read_html("http://www.soccervista.com/results-Liga_MX_Apertura-2016_2017-844815.html")
#Extract table
df <- htmlpage %>% html_nodes("table") %>% html_table()
df <- as.data.frame(df)
#Set whitespaces into NA's
df[df == ""] <- NA
I figured out that some whitespaces have a little whitespace between the quotation marks
df[11,1]
[1] " "
So my solution was to do the next: df[df == " "] <- NA
However the problem is still there and it has the little whitespace! I thought the trim function would work but it didn't...
#Trim
df[,c(1:10)] <- sapply(df[,c(1:10)], trimws)
However, the problem can't go off.
Any ideas?
I just spent some time trying to determine a method usable in a pipe.
Here is my method:
Hope this helps the next searcher.
We need to use
lapply
instead ofsapply
assapply
returns amatrix
instead of alist
and this can create problems in the quotes.and another option if we have spaces like
" "
is to usegsub
to replace those spaces to""
and then change the
""
toNA
Or instead of doing the two replacements, we can do this one go and change the
class
withtype.convert
NOTE: We don't have to specify the column index when all the columns are looped