geom_raster to visualize missing values with addit

2020-04-20 08:50发布

This question is a follow-up to my previous question: Adding color code (fill) to vis_miss plot

I would like to visualize the "missing info" in a data frame using geom_raster from ggplot2 in R while also highlighting some additional data structure using color-coding.

Solution attempt:

library(tidyverse)
x11()
airquality %>%
  mutate(id = row_number()) %>%
  gather(-c(id,Month), key = "key", value = "val") %>%
  mutate(isna = is.na(val)) %>%
  mutate(Month=as.factor(ifelse(isna==TRUE,NA,Month)))  %>%
  ggplot(aes(key, id, fill = Month)) +
    geom_raster() +
    labs(x = "Variable",
           y = "Row Number", title = "Missing values in rows") +
    coord_flip()

plot

This is almost what I want, but it would be nicer to separate the month and NA legends. Is that possible? (Note that my system does not allow me to use transparency (alpha)).

1条回答
够拽才男人
2楼-- · 2020-04-20 08:58

Here, I removed the legend for NA. If this doesn't serve your purpose properly, I can think of a hacky solution to add another legend for data vs. missing.

library(tidyverse)

airquality %>%
  mutate(id = row_number()) %>%
  gather(-c(id,Month), key = "key", value = "val") %>%
  mutate(isna = is.na(val)) %>%
  mutate(Month_Dummy=as.factor(ifelse(isna==TRUE,NA,Month)))  %>%
  mutate(Month=as.factor(Month))  %>% 
  ggplot() +
  geom_raster(aes(key, id, fill = Month)) +
  geom_raster(aes(key, id, fill = Month_Dummy)) +
  labs(x = "Variable",
       y = "Row Number", title = "Missing values in rows") +
  coord_flip()

Update:

The hacky solution that I can think of is adding a geom_point for just one of the missing and used that for the legend of missing data points. It's not the best in terms of appearance, but is the only solution I can think of.

library(tidyverse)

airquality %>%
  mutate(id = row_number()) %>%
  gather(-c(id,Month), key = "key", value = "val") %>%
  mutate(isna = is.na(val)) %>%
  mutate(Month_Dummy=as.factor(ifelse(isna==TRUE,NA,Month)))  %>%
  mutate(Month=as.factor(Month))  -> aqdf

ggplot(data = aqdf, aes(key, id)) +
  geom_raster(aes(fill = Month)) +
  geom_raster(aes(fill = Month_Dummy)) +
  geom_point(data=aqdf[aqdf$isna==TRUE,][1,], 
             aes(NA, id, colour = "NA"),
             inherit.aes = FALSE) +
  scale_color_manual(values=c("grey50")) +
  labs(x = "Variable", y = "Row Number", 
       title = "Missing values in rows", color = "Missing") +
  coord_flip() +
  theme(legend.key = element_rect(fill = "grey50")) 

查看更多
登录 后发表回答