How to make a googleVis multiple Sankey from a dat

2019-03-29 22:50发布

Aim

I am aiming to make a multiple Sankey in R using the googleVis package. The output should look similar to this:

enter image description here

Data

I've created some dummy data in R:

set.seed(1)

source <- sample(c("North","South","East","West"),100,replace=T)
mid <- sample(c("North ","South ","East ","West "),100,replace=T)
destination <- sample(c("North","South","East","West"),100,replace=T) # N.B. It is important to have a space after the second set of destinations to avoid a cycle
dummy <- rep(1,100) # For aggregation

dat <- data.frame(source,mid,destination,dummy)
aggdat <- aggregate(dummy~source+mid+destination,dat,sum)

What I've tried so far

I can build a Sankey with 2 variables fine if I have just a source and destination, but not a middle point:

aggdat <- aggregate(dummy~source+destination,dat,sum)

library(googleVis)

p <- gvisSankey(aggdat,from="source",to="destination",weight="dummy")
plot(p)

The code produces this:

enter image description here

Question

How do I modify

p <- gvisSankey(aggdat,from="source",to="destination",weight="dummy")

to accept the mid variable as well?

1条回答
放荡不羁爱自由
2楼-- · 2019-03-29 23:17

Function gvisSankey does accept mid-levels directly. These levels have to be coded in underlying data.

 source <- sample(c("NorthSrc", "SouthSrc", "EastSrc", "WestSrc"), 100, replace=T)
 mid <- sample(c("NorthMid", "SouthMid", "EastMid", "WestMid"), 100, replace=T)
 destination <- sample(c("NorthDes", "SouthDes", "EastDes", "WestDes"), 100, replace=T) 
 dummy <- rep(1,100) # For aggregation

Now, we'll reshape original data:

 library(dplyr)

 datSM <- dat %>%
  group_by(source, mid) %>%
  summarise(toMid = sum(dummy) ) %>%
  ungroup()

Data frame datSM summarises number of units from Source to Mid.

  datMD <- dat %>%
   group_by(mid, destination) %>%
   summarise(toDes = sum(dummy) ) %>%
   ungroup()

Data frame datMD summarises number of units from Mid to Destination. This data frame will be added to the final data frame. Data frame need to be ungroup and have same colnames.

  colnames(datSM) <- colnames(datMD) <- c("From", "To", "Dummy")

As the datMD is appended as the last one, gvisSankey will recognise the middle step automatically.

  datVis <- rbind(datSM, datMD)

  p <- gvisSankey(datVis, from="From", to="To", weight="dummy")
  plot(p)

Here is the plot: Multilevel Sankey

查看更多
登录 后发表回答