Aim
I am aiming to make a multiple Sankey in R using the googleVis
package. The output should look similar to this:
Data
I've created some dummy data in R:
set.seed(1)
source <- sample(c("North","South","East","West"),100,replace=T)
mid <- sample(c("North ","South ","East ","West "),100,replace=T)
destination <- sample(c("North","South","East","West"),100,replace=T) # N.B. It is important to have a space after the second set of destinations to avoid a cycle
dummy <- rep(1,100) # For aggregation
dat <- data.frame(source,mid,destination,dummy)
aggdat <- aggregate(dummy~source+mid+destination,dat,sum)
What I've tried so far
I can build a Sankey with 2 variables fine if I have just a source and destination, but not a middle point:
aggdat <- aggregate(dummy~source+destination,dat,sum)
library(googleVis)
p <- gvisSankey(aggdat,from="source",to="destination",weight="dummy")
plot(p)
The code produces this:
Question
How do I modify
p <- gvisSankey(aggdat,from="source",to="destination",weight="dummy")
to accept the mid
variable as well?
Function gvisSankey
does accept mid-levels directly. These levels have to be coded in underlying data.
source <- sample(c("NorthSrc", "SouthSrc", "EastSrc", "WestSrc"), 100, replace=T)
mid <- sample(c("NorthMid", "SouthMid", "EastMid", "WestMid"), 100, replace=T)
destination <- sample(c("NorthDes", "SouthDes", "EastDes", "WestDes"), 100, replace=T)
dummy <- rep(1,100) # For aggregation
Now, we'll reshape original data:
library(dplyr)
datSM <- dat %>%
group_by(source, mid) %>%
summarise(toMid = sum(dummy) ) %>%
ungroup()
Data frame datSM
summarises number of units from Source to Mid.
datMD <- dat %>%
group_by(mid, destination) %>%
summarise(toDes = sum(dummy) ) %>%
ungroup()
Data frame datMD
summarises number of units from Mid to Destination. This data frame will be added to the final data frame. Data frame need to be ungroup
and have same colnames
.
colnames(datSM) <- colnames(datMD) <- c("From", "To", "Dummy")
As the datMD
is appended as the last one, gvisSankey
will recognise the middle step automatically.
datVis <- rbind(datSM, datMD)
p <- gvisSankey(datVis, from="From", to="To", weight="dummy")
plot(p)
Here is the plot: