I believe I am close to correctly duplicating my scatterplot with a hexbin plot.
My current scatterplot:
ggplot(shotchart[shotchart$player_id==X,], aes(as.numeric(V1), as.numeric(V2))) +
annotation_custom(court, -3, 100, -102, 3) +
geom_point(aes(colour = class, alpha = 0.8), size = 3) +
scale_color_manual(values = c("#008000", "#FF6347")) +
guides(alpha = FALSE, size = FALSE) +
xlim(-1, 102) +
ylim(102, -3) +
coord_fixed()
What the scatterplot looks like:
My hexbin plot:
ggplot(shotchart[shotchart$player_id==X,], aes(x=as.numeric(V1), y=as.numeric(V2))) +
annotation_custom(court, -3, 100, -102, 3) +
geom_hex(aes(alpha= .3, fill = class, bins = 25))+
guides(alpha = FALSE, size = FALSE) +
xlim(-1, 102) +
ylim(102, -3) +
coord_fixed()
What the hexbin plot looks like:
I believe both plots are saying the same thing. From the plots I want to be able to see where the "shots" were taken and if the shot was a "made" or "missed" The scatterplot portrays this perfectly. But for the hexbin it is not correctly displaying "made' or "missed"
I have circled the major problem areas in the hexbin plot.
The areas that are dark blue should actually be a dark red. The areas in the scatterplot that are green and dark green should be red and dark red in the hexbin plot. I have tried changing the alpha
in the hexbin plot but I have not been able to change the dark blue into dark red.
Any help will be appreciated. I am new to using ggplot so please let me know if any further information is needed.
Without a reproducible example, I can't be certain, but I think what's going on is that red and blue hexagons are being plotted on top of each other in regions where there are overlapping made and missed shots. The darker color you're seeing in these regions is a result of color mixing. The solution is to use smaller bin sizes for the hexagons. However, if there are made and missed shots in the same location, you'll still have some overlapping hexagons.
Here's an example with fake data:
set.seed(494)
dat = data.frame(x = c(runif(1000,0,0.5),runif(100,0.5,1)), y=c(runif(1000,0,0.5),runif(100,0.5,1)),
group=c("Made","Missed"))
ggplot(dat, aes(x,y,colour=group)) +
geom_point(alpha=0.5)
ggplot(dat, aes(x,y,fill=group)) +
geom_hex(alpha=0.5, bins=30) + ggtitle("bins=30")
ggplot(dat, aes(x,y,fill=group)) +
geom_hex(alpha=0.5, bins=80) + ggtitle("bins=80")
The first two plots below compare the plotting points to plotting hexagonal bins. Note that the lower left quadrant has lots of overlapping points. As a result, there are overlapping hexagons. We can reduce the number of overlapping hexagons by increasing the number of bins, as is done in the bottom plot.
Is there a reason you prefer hexagonal bins instead of points? The hexbins just count up the number of made and missed shots in each region of the court. Making the bins small enough to minimize overlap essentially gets you back to a scatterplot (i.e., the bins are about the size of a point marker). You could just use geom_point
with hexagonal point markers in that case. On the other hand, do you really want to know the proportion of shots made in different areas of the court? Something like this, for example:
library(scales)
dat$made.flg = ifelse(dat$group=="Made", 1, 0)
ggplot(dat, aes(x, y, z=made.flg)) +
stat_summary_hex(fun=mean, bins=30) +
scale_fill_gradient(low="blue", high="red", labels=percent_format(), name="Pecent Made")