How to find the intersection of two densities with

2019-06-01 17:47发布

问题:

How do I find the intersection of two density plots created with ggplot2?

A sample from the data frame named combined:

futureChange direction

2009-10-26 0.9980446 long

2008-04-28 1.0277389 not long

2012-07-09 1.0302413 not long

2010-11-15 1.0017247 not long

I create the density plot using this code.

ggplot(combined, aes(futureChange, fill = direction))  
+ geom_density(alpha = 0.2) 
+ ggtitle(paste(symbol,"Long SB Frequency",sep=" "))

I want to find where the pink density line intersects with the blue density line.

I saw other posts that mentioned the intersect function, but I can't figure out how to use it with a density ggplot2 since I don't have the density vectors.

回答1:

The stat_density function in ggplot2 uses R's density function. Using the density function will give us explicit values for the density estimation which we can use to find the intersection point (I generate data here because the given data isn't enough to perform density calculation):

set.seed(10)
N <- 100
combined <- data.frame(futureChange = c(rnorm(N, mean = -1), rnorm(N, mean = 1)),
                       direction = rep(c("long", "not long"), each = N))

lower.limit <- min(combined$futureChange)
upper.limit <- max(combined$futureChange)
long.density <- density(subset(combined, direction == "long")$futureChange, from = lower.limit, to = upper.limit, n = 2^10)
not.long.density <- density(subset(combined, direction == "not long")$futureChange, from = lower.limit, to = upper.limit, n = 2^10)

density.difference <- long.density$y - not.long.density$y
intersection.point <- long.density$x[which(diff(density.difference > 0) != 0) + 1]

ggplot(combined, aes(futureChange, fill = direction)) + geom_density(alpha = 0.2) + 
  geom_vline(xintercept = intersection.point, color = "red")

Taking this step by step, we first compute the limits over which the density for each group should be calculated (lower.limit and upper.limit). We do this because we need these ranges to be the same for both density calculations so that we can compare them later. Additionally, we specify the number of points over which the density is calculated with the n argument in the density function (if you want more accurate results, increase this).

Next, we calculate the densities for each group in the data. We then want to find the intersection, so we can take the difference of the calculated densities and see when it switches from positive to negative or vice versa. The command which(diff(density.difference > 0) != 0) + 1 will give us the indices at which these switches occur (we add one because of the differencing), so we can get the value of that intersection by taking the corresponding value in long.density$x (or not.long.density$x since those are the same by construction).



标签: r ggplot2