2d density plot for categories

2019-06-11 06:41发布

I'm trying to make a 2d density plot where the density is displayed for each category. For example, in the image below, we have a density plot for each day, and all the daily densities are combined into the coloured plots. These types of plots are common in the scientific literature on atmospheric sciences and aerosol pollution studies.

enter image description here

So far I've got this

ggplot(dat, aes(y = `dN/dlogDp`, x = date)) +
  stat_density2d(geom="tile", aes(fill = ..density..), contour = FALSE) +
  scale_fill_gradient(low="blue", high="red") +
  geom_point(alpha = 0.1)   +
  theme_bw()

enter image description here

But I want to facet it by day, and I'm not sure where to start.

Here are the example data:

structure(list(date = structure(c(1359244800, 1359245400, 1359246000, 
1359246600, 1359247200, 1359247800, 1359248400, 1359249000, 1359249600, 
1359250200, 1359250800, 1359251400, 1359252000, 1359252600, 1359253200, 
1359253800, 1359254400, 1359255000, 1359255600, 1359256200, 1359256800, 
1359257400, 1359258000, 1359258600, 1359259200, 1359259800, 1359260400, 
1359261000, 1359261600, 1359262200, 1359262800, 1359263400, 1359264000, 
1359264600, 1359265200, 1359265800, 1359266400, 1359267000, 1359267600, 
1359268200, 1359268800, 1359269400, 1359270000, 1359270600, 1359271200, 
1359271800, 1359272400, 1359273000, 1359273600, 1359274200, 1359274800, 
1359275400, 1359276000, 1359276600, 1359277200, 1359277800, 1359278400, 
1359279000, 1359279600, 1359280200, 1359280800, 1359281400, 1359282000, 
1359282600, 1359283200, 1359283800, 1359284400, 1359285000, 1359285600, 
1359286200, 1359286800, 1359287400, 1359288000, 1359288600, 1359289200, 
1359289800, 1359290400, 1359291000, 1359291600, 1359292200, 1359292800, 
1359293400, 1359294000, 1359294600, 1359295200, 1359295800, 1359296400, 
1359297000, 1359297600, 1359298200, 1359298800, 1359299400, 1359300000, 
1359300600, 1359301200, 1359301800, 1359302400, 1359303000, 1359303600, 
1359304200), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    `dN/dlogDp` = c(49.8, 49.275, 47.4, 47.975, 48.625, 51.725, 
    50.7, 47.55, 45.975, 45.35, 45.4, 47.75, 49.625, 48.225, 
    47.65, 47.3, 48.75, 50.075, 34.725, 42.025, 48.825, 52.25, 
    54.05, 49.15, 34.6, 34.375, 42.85, 30.325, 43.15, 36.875, 
    32.85, 36.85, 35.725, 39.8, 38.65, 40.1, 42.675, 38.5, 37.2, 
    34.425, 25.2, 14.725, 22.675, 14.875, 37.45, 46.025, 49.275, 
    35.425, 30, 38.9, 28.6, 41.675, 46.05, 48.6, 62.425, 62.65, 
    61.7, 49.5, 70.05, 71.875, 59.4, 38.525, 36.85, 25.625, 14.675, 
    14.7, 14.6, 14.725, 15.6, 15, 14.6, 14.75, 15.05, 14.975, 
    15.425, 15.1, 15.95, 14.95, 15, 14.6, 14.725, 14.85, 15.175, 
    28.95, 14.975, 14.725, 16.6, 18.925, 53.225, 60.2, 56.425, 
    54.55, 41.4, 19.025, 19.825, 31.875, 14.85, 16.375, 16.65, 
    34.325), Diameter = c(14.6, 15.1, 15.7, 16.3, 16.8, 17.5, 
    18.1, 18.8, 19.5, 20.2, 20.9, 21.7, 22.5, 23.3, 24.1, 25, 
    25.9, 26.9, 27.9, 28.9, 30, 31.1, 32.2, 33.4, 34.6, 35.9, 
    37.2, 38.5, 40, 41.4, 42.9, 44.5, 46.1, 47.8, 49.6, 51.4, 
    53.3, 55.2, 57.3, 59.4, 61.5, 63.8, 66.1, 68.5, 71, 73.7, 
    76.4, 79.1, 82, 85.1, 88.2, 91.4, 94.7, 98.2, 101.8, 105.5, 
    109.4, 113.4, 117.6, 121.9, 126.3, 131, 135.8, 140.7, 145.9, 
    151.2, 156.8, 162.5, 168.5, 174.7, 181.1, 187.7, 194.6, 201.7, 
    209.1, 216.7, 224.7, 232.9, 241.4, 250.3, 259.5, 269, 278.8, 
    289, 299.6, 310.6, 322, 333.8, 346, 358.7, 371.8, 385.4, 
    399.5, 414.2, 429.4, 445.1, 461.4, 478.3, 495.8, 514)), .Names = c("date", 
"dN/dlogDp", "Diameter"), row.names = c(NA, 100L), class = c("tbl_df", 
"tbl", "data.frame"))

UPDATE This question is misguided and I now think that using categories isn't relevant to recreating this plot. These other questions are more closely related to the task of recreating this plot:

And after I asked this question I have been keeping an updated gist of R code that combines details from the answers to these questions, and successfully replicates these plots (example output included in the gist). That gist is here: https://gist.github.com/benmarwick/9a54cbd325149a8ff405

1条回答
Lonely孤独者°
2楼-- · 2019-06-11 07:09

The key steps are to strip away much of the decoration in the panels, and use scale_*_continuous(expand = c(0,0)) to make the density plot fill the entire panel. Here's an example of how to put it together:

# get the day and hour to use as facet panels
dat$day <- as.Date(dat$date)
dat$hour <- as.numeric(format(dat$date, "%H"))

library(ggplot2)
library(viridis)

# theme to suppress many details
squeeze_grid_theme <- theme_bw() + theme(axis.title = element_blank(), 
                            axis.ticks = element_blank(), 
                            axis.text = element_blank(), 
                            strip.text = element_blank(), 
                            strip.background = element_blank(), 
                            panel.margin.y = unit(0, "lines"), 
                            panel.margin.x = unit(-1,"lines"),
                            panel.border = element_blank(), 
                            panel.grid = element_blank(), 
                            axis.text.x = element_text(margin=margin(0,0,0,0,"pt")),
                            axis.text.y = element_text(margin=margin(0,0,0,0,"pt")))

p <- ggplot(dat, aes(z = Diameter, y = `dN/dlogDp`, x = date)) +
  stat_density2d(geom="tile", aes(fill = ..density..), contour = FALSE) +
  scale_fill_viridis() +
  geom_point(alpha = 0.1)   +
  facet_grid(~hour) +
  scale_y_continuous(expand = c(0,0)) + 
  scale_x_datetime(expand = c(0,0)) +
  squeeze_grid_theme

p

enter image description here

Then we get a separate density plot for each hour, tightly squeezed together like the example plot in the question.

查看更多
登录 后发表回答