可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

a perhaps simple question I tried to make an errorgraph like the one shown in page 532 of Field's "Discovering Statistics Using R".

The code can be found here http://www.sagepub.com/dsur/study/DSUR%20R%20Script%20Files/Chapter%2012%20DSUR%20GLM3.R :

line <- ggplot(gogglesData, aes(alcohol, attractiveness, colour = gender))
line + stat_summary(fun.y = mean, geom = "point") + 
stat_summary(fun.y = mean, geom = "line", aes(group= gender)) + 
stat_summary(fun.data = mean_cl_boot, geom = "errorbar", width = 0.2) + 
labs(x = "Alcohol Consumption", y = "Mean Attractiveness of Date (%)", colour = "Gender")

I produced the same graph; my y-axis variable has only 4-points (it is a discrete scale, 1-4), now the y-axis has the points 1.5, 2, 2.5 in which the lines vary.

And the question is: what do these points and graphs describe? I assume that the important part is stat_summary(fun.data = mean_cl_boot, geom = "errorbar", width = 0.2) are they count of observations for that group and that level(x-axis)? Are they frequencies? Or, are they proportions?

I found this http://docs.ggplot2.org/0.9.3/stat_summary.html but it did not help me

Thank you

回答1:

Here is what the ggplot2 book on page 83 says about mean_cl_boot()

Function          Hmisc original        Middle Range
mean_cl_boot() smean.cl.boot() Mean Standard error from bootstrap

I think that it is the smean.cl.boot() from Hmisc package but renamed as mean.cl.boot() in ggplot2.

and here is the definition of original function from Hmisc package :

smean.cl.boot is a very fast implementation of the basic nonparametric bootstrap for obtaining confidence limits for the population mean without assuming normality

回答2:

I reproduced the graph using your code and I get essentially the same graph shown in Field's book, Discovering Statistics Using R, figure 12.12, page 532, except for the ordering of the variables on the x axis. The y axis displays the continuous variable, Mean Attractiveness of Date (%). The 95% confidence intervals, created--as you point out--with the stat_summary() function and the mean_cl_boot argument are bootstrap confidence intervals using the smean.cl.boot() function in Hmisc, as pointed out by another commenter above. This function is described on page 262 of the Hmisc documentation. The ggplot2 documentation on mean_cl_boot is sparse and defers to the description in the Hmisc package.

Note that the arguments to mean_cl_boot in ggplot2 are the same as those in the smean.cl.boot function in the Hmisc package. You can change the desired confidence level from the default of .95 by using the conf.int argument and the number of bootstrap samples by using the B argument. Here, for example, is the code for creating the same plot with a 99% confidence interval and 5000 bootstrap samples:

line <- ggplot(gogglesData, aes(alcohol, attractiveness, colour = gender))
line + stat_summary(fun.y = mean, geom = "point") + 
stat_summary(fun.y = mean, geom = "line", aes(group= gender)) + 
stat_summary(fun.data = mean_cl_boot, conf.int = .99, B = 5000, geom = "errorbar", width = 0.2) + 
labs(x = "Alcohol Consumption", y = "Mean Attractiveness of Date (%)", colour = "Gender")