I'd like to visualise the following data: a hotel observes that each year some of its customers are repeat customers. So, each year about half of all customers are fist-time customers, 20% are 2nd time-customers, and so on. Below is some R code that includes the data and a visualisation. However, I'm not happy with it and I'm looking for improvements:
- R doesn't like color bands with many colours - so maybe group data?
- would a step curve be a better visualisation altogether?
The number of visits is treated as a factor - is this the right approach?
Stacking bars makes it easy to compare 1st-time guests but not the other ones. Should I pick a different visualisation?
#! /usr/bin/env R CMD BATCH library(ggplot2) d <- read.table(header=TRUE, text=' year visit count 2013 1 1641 2013 2 604 2013 3 256 2013 4 89 2013 5 32 2013 6 10 2013 7 4 2013 8 3 2014 1 1365 2014 2 637 2014 3 276 2014 4 154 2014 5 86 2014 6 39 2014 7 19 2014 8 6 2014 9 4 2014 10 2 2014 11 1 2014 12 1 2015 1 1251 2015 2 608 2015 3 288 2015 4 143 2015 5 88 2015 6 52 2015 7 21 2015 8 8 2015 9 8 2015 10 3 2015 11 2 2015 12 1') d$year <- factor(d$year) d$visit <- factor(d$visit) p <- ggplot(d, aes(year,count)) p <- p + geom_bar(aes(fill=visit),position="fill",stat="identity") p <- p + xlab("Year") + ylab("Distribution") # pdf("returners.pdf",9,6) print(p) # dev.off()
It seems that you're trying to compare the contributions to the total number of visits to the hotel by number of prior visits as well as do a year-to-year comparison. The following code puts this together in one chart.
which gives the chart
This representation suggests that the drop in 2015 as compared to previous years is due to fewer first time customers as opposed to a reduction in returning ones.
Why not visualize them like actual distributions?
To see the visit count deltas by year, you can just swap the facets:
Or, you can look at YoY growth by visit (%):