Inference() Function Insisting That I Use ANOVA Ve

2019-12-16 21:04发布

问题:

I'm trying to use a custom function called Inference() as seen in the code below. There's no documentation for the function, but it is from my DASI class in Coursera. According to the feedback I have received, I am using the function properly. I'm trying to do a two-sided hypothesis test between my class variable and my wordsum variable, that is, between the two means of the categories low class and working class. So, the average wordsum for working class - average wordsum for lower class. However, the function/R/R Studio keep insisting I do an ANOVA test. This doesn't work for me since I'm trying to reject the null, and create a confidence interval between the difference of two independent means. I've looked at the function, but as I'm no R expert, I don't see anything out of the ordinary. Any help is greatly appreciated.

Code:

load(url("http://bit.ly/dasi_gss_ws_cl"))
source("http://bit.ly/dasi_inference")

summary(gss)
by(gss$wordsum, gss$class, mean)
boxplot(gss$wordsum ~ gss$class)

gss_clean = na.omit(subset(gss, class == "WORKING" | class =="LOWER"))

inference(y = gss_clean$wordsum, x = gss_clean$class, est = "mean", type = "ht", 
          null = 0, alternative = "twosided", method = "theoretical")

Returns:

Response variable: numerical, Explanatory variable: categorical
Error: Use alternative = 'greater' for ANOVA or chi-square test.
In addition: Warning message:
Ignoring null value since it's undefined for ANOVA.

回答1:

You need

gss_clean <- droplevels(gss_clean)

Then your inference() call works:

Response variable: numerical, Explanatory variable: categorical
Difference between two means
Summary statistics:
n_LOWER = 41, mean_LOWER = 5.0732, sd_LOWER = 2.2404
n_WORKING = 407, mean_WORKING = 5.7494, sd_WORKING = 1.8652
Observed difference between means (LOWER-WORKING) = -0.6762
H0: mu_LOWER - mu_WORKING = 0 
HA: mu_LOWER - mu_WORKING != 0 
Standard error = 0.362 
Test statistic: Z =  -1.868 
p-value =  0.0616 

The problem is that unless you drop the unused levels of the factor, the internal machinery of inference() thinks that you have a 4-level categorical variable, and it can't do a t-test or equivalent 2-category test: it has to do a one-way ANOVA or analogue.