-->

How to add value labels on the flows item of a All

2020-07-27 20:30发布

问题:

I'm looking to label the "flow" portion of Alluvial / Sankey chart on R.

The stratums (columns) can easily be labelled, but not the flows connecting them. All my attempts on reading the documentations and experimenting were to no avail.

In the sample below, "freq" is expected to be labelled on the flow connection part.

library(ggplot2)
library(ggalluvial)

data(vaccinations)
levels(vaccinations$response) <- rev(levels(vaccinations$response))
ggplot(vaccinations,
       aes(x = survey, stratum = response, alluvium = subject,
           y = freq,
           fill = response, label = freq)) +
  scale_x_discrete(expand = c(.1, .1)) +
  geom_flow() +
  geom_stratum(alpha = .5) +
  geom_text(stat = "stratum", size = 3) +
  theme(legend.position = "bottom") +
  ggtitle("vaccination survey responses at three points in time")

回答1:

There is an option to take the raw numbers and use these as labels for the flow part:

ggplot(vaccinations,
       aes(x = survey, stratum = response, alluvium = subject,
           y = freq,
           fill = response, label = freq)) +
  scale_x_discrete(expand = c(.1, .1)) +
  geom_flow() +
  geom_stratum(alpha = .5) +
  geom_text(stat = "stratum", size = 3) +
  geom_text(stat = "flow", nudge_x = 0.2) +
  theme(legend.position = "bottom") +
  ggtitle("vaccination survey responses at three points in time")

If you want more control over how to label these points, you can extract the layer data and do computations on that. For example we can compute the fractions for only the starting positions as follows:

# Assume 'g' is the previous plot object saved under a variable
newdat <- layer_data(g)
newdat <- newdat[newdat$side == "start", ]
split <- split(newdat, interaction(newdat$stratum, newdat$x))
split <- lapply(split, function(dat) {
  dat$label <- dat$label / sum(dat$label)
  dat
})
newdat <- do.call(rbind, split)

ggplot(vaccinations,
       aes(x = survey, stratum = response, alluvium = subject,
           y = freq,
           fill = response, label = freq)) +
  scale_x_discrete(expand = c(.1, .1)) +
  geom_flow() +
  geom_stratum(alpha = .5) +
  geom_text(stat = "stratum", size = 3) +
  geom_text(data = newdat, aes(x = xmin + 0.4, y = y, label = format(label, digits = 1)),
            inherit.aes = FALSE) +
  theme(legend.position = "bottom") +
  ggtitle("vaccination survey responses at three points in time")

It still is kind of a judgement call about where exactly you want to place the labels. Doing it at the start is the easy way, but if you want these labels to be approximately in the middle and dodging oneanother it would require some processing.