So I'm trying to make a data table of some information on a website. This is what I've done so far.
library(rvest)
url <- 'https://uws-community.symplicity.com/index.php?s=student_group'
page <- html_session(url)
name_nodes <- html_nodes(page,".grpl-name a")
name_text <- html_text(name_nodes)
df <- data.frame(matrix(unlist(name_text)), stringsAsFactors = FALSE)
library(tidyverse)
df <- df %>% mutate(id = row_number())
desc_nodes <- html_nodes(page, ".grpl-purpose")
desc_text <- html_text(desc_nodes)
df <- left_join(df, data.frame(matrix(unlist(desc_text)),
stringsAsFactors = FALSE) %>%
mutate(id = row_number()))
email_nodes <- html_nodes(page, ".grpl-contact a")
email_text <- html_text(email_nodes)
df <- left_join(df, data.frame(matrix(unlist(email_text)),
stringsAsFactors = FALSE) %>%
mutate(id = row_number()))
This has been working until I got to the emails part. A few of the entries do not have emails. In the data frame, instead of the appropriate rows showing the NA value for the email, the last three rows show an NA value.
How do I make it so the appropriate rows show have the NA value instead of just the last 3 rows?