plot line width (size) based on counts using ggplo

2019-05-21 07:39发布

问题:

I have a dataset in long-format, each ID 'walks' 3 steps, each step (variable name is step) can land on different locations (variable name is milestone), I want to draw all of the paths. Because there are some paths more traveled, I want to make the width (size) of the paths proportional to their counts. I am imagining it to be something like geom_line(aes(size=..count..))in ggplot, but it doesn't work.

Below is my code, in the code you can find the url for the example dataset. My silly solution to add width was to dodge the line, but it's not proportional, and it leaves cracks.

ddnew <- read.csv("https://raw.github.com/bossaround/question/master/data9.csv" ) 
ggplot(ddnew, aes(x=step, y=milestone, group=user_id)) +
  geom_line(position = position_dodge(width=0.05)) +
  scale_x_discrete(limits=c("0","1","2","3","4","5","6","7","8","9")) +
  scale_y_discrete(limits=c("0","1","2","3","4","5","6","7","8","9"))

The plot from my current code looks like this, but you can see the cracks, and it's not proportional.

I was hoping this can look like a Sankey diagram with the width indicating counts.

回答1:

If you are looking for user-specifc counts of paths then this might help:

ddnew <-   read.csv("https://raw.github.com/bossaround/question/master/data9.csv" ) 

ddnew <- ddnew %>% 
  group_by(user_id) %>% 
  mutate(step_id = paste(step, collapse = ","), 
         milestone_id = paste(milestone, collapse = ",")) %>% 
  group_by(step_id, milestone_id) %>% 
  mutate(width = n())

ggplot(ddnew, aes(x=step, y=milestone, group=user_id)) +
  geom_line(aes(size = width)) +
  scale_x_discrete(limits=c("0","1","2","3","4","5","6","7","8","9")) +
  scale_y_discrete(limits=c("0","1","2","3","4","5","6","7","8","9"))

The idea is to count unique user-specific paths and assign these counts as width in the geom_line() aesthetic.



回答2:

Does this help?

library(ggplot2)
ddnew <- read.csv("https://raw.github.com/bossaround/question/master/data9.csv" ) 
ggplot(ddnew, aes(x=step, y=milestone, group=user_id)) +
        stat_summary(geom="line", fun.y = "sum", aes(size=milestone),alpha=0.2, color="grey50")+
        scale_x_discrete(limits=factor(0:2)) +
        scale_y_discrete(limits=factor(0:10)) +
        theme(panel.background = element_blank(), 
              legend.position = "none")


回答3:

One option is to use the riverplot package. First you'll need to summarize your data so that you can define the edges and nodes.

> library(riverplot)
> 
> paths <- spread(ddnew, step, milestone) %>%
+   count(`1`, `2`, `3`)
> paths
Source: local data frame [9 x 4]
Groups: 1, 2 [?]

    `1`   `2`   `3`     n
  <int> <int> <int> <int>
1     1     2     3     7
2     1     2    10     8
3     1     3     2     1
4     1     4     8     1
5     1    10     2   118
6     1    10     3    33
7     1    10     4     2
8     1    10     5     1
9     1    10    NA    46

Next define your nodes (i.e. each combination of step and milestone).

prefix <- function(p, n) {paste(p, n, sep = '-')}

nodes <- distinct(ddnew, step, milestone) %>%
  mutate(ID = prefix(step, milestone),
         y = dense_rank(milestone)) %>%
  select(ID, x = step, y)

Then define your edges:

e12 <- group_by(paths, N1 = `1`, N2 = `2`) %>%
  summarise(Value = sum(n)) %>%
  ungroup() %>%
  mutate(N1 = prefix(1, N1),
         N2 = prefix(2, N2))

e23 <- group_by(paths, N1 = `2`, N2 = `3`) %>%
  filter(!is.na(N2)) %>%
  summarise(Value = sum(n)) %>%
  ungroup() %>%
  mutate(N1 = prefix(2, N1),
         N2 = prefix(3, N2))

edges <- bind_rows(e12, e23) %>% 
  mutate(Value = Value) %>%
  as.data.frame()

Finally, make the plot:

style <- default.style()
style$srt <- '0'  # display node labels horizontally

makeRiver(nodes, edges) %>% plot(default_style = style)