The difference between geom_density in ggplot2 and

2019-02-26 14:44发布

问题:

I have a data in R like the following:

  bag_id location_type            event_ts
2     155        sorter 2012-01-02 17:06:05
3     305       arrival 2012-01-01 07:20:16
1     155      transfer 2012-01-02 15:57:54
4     692       arrival 2012-03-29 09:47:52
10    748      transfer 2012-01-08 17:26:02
11    748        sorter 2012-01-08 17:30:02
12    993       arrival 2012-01-23 08:58:54
13   1019       arrival 2012-01-09 07:17:02
14   1019        sorter 2012-01-09 07:33:15
15   1154      transfer 2012-01-12 21:07:50

where class(event_ts) is POSIXct.

I wanted to find the density of bags at each location in different times.

I used the command geom_density(ggplot2) and I could plot it very nice. I wonder if there is any difference between density(base) and this command. I mean any difference about the methods that they are using or the default bandwith that they are using and the like.

I need to add the densities to my data frame. If I had used the function density(base), I knew how I can use the function approxfun to add these values to my data frame, but I wonder if it is the same when I use geom_density(ggplot2) .

回答1:

A quick perusal of the ggplot2 documentation for geom_density() reveals that it wraps up the functionality in stat_density(). The first argument there references that the adjust parameter coming from the base function density(). So, to your direct question - they are built off of the same function, though the exact parameters used may be different. You have some control over setting those parameters, but you may not be able to have the amount of flexibility you want.

One alternative to using geom_density() is to calculate the density that you want outside of ggplot() and then plot it with geom_line(). For example:

library(ggplot2)
#100 random variables
x <- data.frame(x = rnorm(100))
#Calculate own density, set parameters as you desire
d <- density(x$x)
x2 <- data.frame(x = d$x, y = d$y)

#Using geom_density()
ggplot(x, aes(x)) + geom_density()
#Using home grown density
ggplot(x2, aes(x,y)) + geom_line(colour = "red")

Here, they give nearly identical plots, though they may vary more significantly with your data and your settings.