可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have a dataset in long-format, each ID 'walks' 3 steps, each step (variable name is step) can land on different locations (variable name is milestone), I want to draw all of the paths. Because there are some paths more traveled, I want to make the width (size) of the paths proportional to their counts. I am imagining it to be something like geom_line(aes(size=..count..))
in ggplot, but it doesn't work.
Below is my code, in the code you can find the url for the example dataset. My silly solution to add width was to dodge the line, but it's not proportional, and it leaves cracks.
ddnew <- read.csv("https://raw.github.com/bossaround/question/master/data9.csv" )
ggplot(ddnew, aes(x=step, y=milestone, group=user_id)) +
geom_line(position = position_dodge(width=0.05)) +
scale_x_discrete(limits=c("0","1","2","3","4","5","6","7","8","9")) +
scale_y_discrete(limits=c("0","1","2","3","4","5","6","7","8","9"))
The plot from my current code looks like this, but you can see the cracks, and it's not proportional.
![](https://www.manongdao.com/static/images/pcload.jpg)
I was hoping this can look like a Sankey diagram with the width indicating counts.
回答1:
If you are looking for user-specifc counts of paths then this might help:
ddnew <- read.csv("https://raw.github.com/bossaround/question/master/data9.csv" )
ddnew <- ddnew %>%
group_by(user_id) %>%
mutate(step_id = paste(step, collapse = ","),
milestone_id = paste(milestone, collapse = ",")) %>%
group_by(step_id, milestone_id) %>%
mutate(width = n())
ggplot(ddnew, aes(x=step, y=milestone, group=user_id)) +
geom_line(aes(size = width)) +
scale_x_discrete(limits=c("0","1","2","3","4","5","6","7","8","9")) +
scale_y_discrete(limits=c("0","1","2","3","4","5","6","7","8","9"))
The idea is to count unique user-specific paths and assign these counts as width in the geom_line()
aesthetic.
![](https://www.manongdao.com/static/images/pcload.jpg)
回答2:
Does this help?
library(ggplot2)
ddnew <- read.csv("https://raw.github.com/bossaround/question/master/data9.csv" )
ggplot(ddnew, aes(x=step, y=milestone, group=user_id)) +
stat_summary(geom="line", fun.y = "sum", aes(size=milestone),alpha=0.2, color="grey50")+
scale_x_discrete(limits=factor(0:2)) +
scale_y_discrete(limits=factor(0:10)) +
theme(panel.background = element_blank(),
legend.position = "none")
回答3:
One option is to use the riverplot
package. First you'll need to summarize your data so that you can define the edges and nodes.
> library(riverplot)
>
> paths <- spread(ddnew, step, milestone) %>%
+ count(`1`, `2`, `3`)
> paths
Source: local data frame [9 x 4]
Groups: 1, 2 [?]
`1` `2` `3` n
<int> <int> <int> <int>
1 1 2 3 7
2 1 2 10 8
3 1 3 2 1
4 1 4 8 1
5 1 10 2 118
6 1 10 3 33
7 1 10 4 2
8 1 10 5 1
9 1 10 NA 46
Next define your nodes (i.e. each combination of step and milestone).
prefix <- function(p, n) {paste(p, n, sep = '-')}
nodes <- distinct(ddnew, step, milestone) %>%
mutate(ID = prefix(step, milestone),
y = dense_rank(milestone)) %>%
select(ID, x = step, y)
Then define your edges:
e12 <- group_by(paths, N1 = `1`, N2 = `2`) %>%
summarise(Value = sum(n)) %>%
ungroup() %>%
mutate(N1 = prefix(1, N1),
N2 = prefix(2, N2))
e23 <- group_by(paths, N1 = `2`, N2 = `3`) %>%
filter(!is.na(N2)) %>%
summarise(Value = sum(n)) %>%
ungroup() %>%
mutate(N1 = prefix(2, N1),
N2 = prefix(3, N2))
edges <- bind_rows(e12, e23) %>%
mutate(Value = Value) %>%
as.data.frame()
Finally, make the plot:
style <- default.style()
style$srt <- '0' # display node labels horizontally
makeRiver(nodes, edges) %>% plot(default_style = style)
![](https://www.manongdao.com/static/images/pcload.jpg)