How to report with APA style a Bayesian Linear (Mi

2020-06-27 02:29发布

问题:

I'm currently struggling with how to report, following APA-6 recommendations, the output of rstanarm::stan_lmer().

First, I'll fit a mixed model within the frequentist approach, then will try to do the same using the bayesian framework.

Here's the reproducible code to get the data:

library(tidyverse)
library(neuropsychology)
library(rstanarm)
library(lmerTest)

df <- neuropsychology::personality %>% 
  select(Study_Level, Sex, Negative_Affect) %>% 
  mutate(Study_Level=as.factor(Study_Level),
         Negative_Affect=scale(Negative_Affect)) # I understood that scaling variables is important

Now, let's fit a linear mixed model in the "traditional" way to test the impact of Sex (male/female) on Negative Affect (negative mood) with the study level (years of education) as random factor.

fit <- lmer(Negative_Affect ~ Sex + (1|Study_Level), df)
summary(fit)

The output is the following:

Linear mixed model fit by REML t-tests use Satterthwaite approximations to degrees of
  freedom [lmerMod]
Formula: Negative_Affect ~ Sex + (1 | Study_Level)
   Data: df

REML criterion at convergence: 3709

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.58199 -0.72973  0.02254  0.68668  2.92841 

Random effects:
 Groups      Name        Variance Std.Dev.
 Study_Level (Intercept) 0.04096  0.2024  
 Residual                0.94555  0.9724  
Number of obs: 1327, groups:  Study_Level, 8

Fixed effects:
              Estimate Std. Error         df t value Pr(>|t|)    
(Intercept)    0.01564    0.08908    4.70000   0.176    0.868    
SexM          -0.46667    0.06607 1321.20000  -7.064 2.62e-12 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
     (Intr)
SexM -0.149

To report it, I would say that "we fitted a linear mixed model with negative affect as outcome variable, sex as predictor and study level was entered as a random effect. Within this model, the male level led to a significant decrease of negative affect (beta = -0.47, t(1321)=-7.06, p < .001).

Is that correct?

Then, let's try to fit the model within a bayesian framework using rstanarm:

fitB <- stan_lmer(Negative_Affect ~ Sex + (1|Study_Level),
                  data=df,
                  prior=normal(location=0, scale=1), 
                  prior_intercept=normal(location=0, scale=1),
                  prior_PD=F)
print(fitB, digits=2)

This returns:

stan_lmer
 family:  gaussian [identity]
 formula: Negative_Affect ~ Sex + (1 | Study_Level)
------

Estimates:
            Median MAD_SD
(Intercept)  0.02   0.10 
SexM        -0.47   0.07 
sigma        0.97   0.02 

Error terms:
 Groups      Name        Std.Dev.
 Study_Level (Intercept) 0.278   
 Residual                0.973   
Num. levels: Study_Level 8 

Sample avg. posterior predictive 
distribution of y (X = xbar):
         Median MAD_SD
mean_PPD 0.00   0.04  

------
For info on the priors used see help('prior_summary.stanreg').

I think than median is the median of the posterior distribution of the coefficient and mad_sd the equivalent of standart deviation. These parameters are close to the beta and standart error of the frequentist model, which is reassuring. However, I do not know how to formalize and put the output in words.

Moreover, if I do the summary of the model (summary(fitB, probs=c(.025, .975), digits=2)), I get other features of the posterior distribution:

...
Estimates:
                                             mean     sd       2.5%     97.5% 
(Intercept)                                    0.02     0.11    -0.19     0.23
SexM                                          -0.47     0.07    -0.59    -0.34
...

Is something like the following good?

"we fitted a linear mixed model within the bayesian framework with negative affect as outcome variable, sex as predictor and study level was entered as a random effect. Priors for the coefficient and the intercept were set to normal (mean=0, sd=1). Within this model, the features of the posterior distribution of the coefficient associated with the male level suggest a decrease of negative affect (mean = -0.47, sd = 0.11, 95% CI[-0.59, -0.34]).

Thanks for your help.

回答1:

The following is personal opinion that may or may not be acceptable to a psychology journal.

To report it, I would say that "we fitted a linear mixed model with negative affect as outcome variable, sex as predictor and study level was entered as a random effect. Within this model, the male level led to a significant decrease of negative affect (beta = -0.47, t(1321)=-7.06, p < .001).

Is that correct?

That is considered correct from a frequentist perspective.

The key concepts from a Bayesian perspective are that (conditional on the model, of course)

  1. There is a 0.5 probability that the true effect is less than the posterior median and a 0.5 probability that the true effect is greater than the posterior median. Frequentists tend to see a posterior median as being like a numerical optimum.
  2. The posterior_interval function yields credible intervals around the median with a default probability of 0.9 (although a lower number produces more accurate estimates of the bounds). So, you can legitimately say that there is a probability of 0.9 that the true effect is between those bounds. Frequentists tend to see confidence intervals as being like credible intervals.
  3. The as.data.frame function will give you access to the raw draws, so mean(as.data.frame(fitB)$male > 0) yields the probability that the expected difference in the outcome between men and women in the same study is positive. Frequentists tend to see these probabilities as being like p-values.

For a Bayesian approach, I would say

We fit a linear model using Markov Chain Monte Carlo with negative affect as the outcome variable, sex as predictor and the intercept was allowed to vary by study level.

And then talk about the estimates using the three concepts above.