Given data of the following form
myDat = structure(list(Score = c(1.84, 2.24, 3.8, 2.3, 3.8, 4.55, 1.13,
2.49, 3.74, 2.84, 3.3, 4.82, 1.74, 2.89, 3.39, 2.08, 3.99, 4.07,
1.93, 2.39, 3.63, 2.55, 3.09, 4.76), Subject = c(1L, 1L, 1L,
2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L,
7L, 7L, 8L, 8L, 8L), Condition = c(0L, 0L, 0L, 1L, 1L, 1L, 0L,
0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L,
1L), Time = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L)), .Names = c("Score",
"Subject", "Condition", "Time"), class = "data.frame", row.names = c(NA,
-24L))
I would like to model Score as a function of Subject, Condition and Time. Each (human) Subject's score was measured three times, indicated by the variable Time, so I have repeated measures.
How can I build in R a random effects model with Subject effects fitted as random?
ADDENDUM: It's been asked how I generated these data. You guessed it, the data are fake as the day is long. Score is time plus random noise and being in Condition 1 adds a point to Score. It's instructive as a typical Psych setup. You have a task where people's score gets better with practice (time) and a drug (condition==1) that enhances score.
Here are some more realistic data for the purposes of this discussion. Now simulated participants have a random "skill" level that is added to their scores. Also, the factors are now strings.
myDat = structure(list(Score = c(1.62, 2.18, 2.3, 3.46, 3.85, 4.7, 1.41,
2.21, 3.32, 2.73, 3.34, 3.27, 2.14, 2.73, 2.74, 3.39, 3.59, 4.01,
1.81, 1.83, 3.22, 3.64, 3.51, 4.26), Subject = structure(c(1L,
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L,
6L, 7L, 7L, 7L, 8L, 8L, 8L), .Label = c("A", "B", "C", "D", "E",
"F", "G", "H"), class = "factor"), Condition = structure(c(1L,
1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L,
2L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("No", "Yes"), class = "factor"),
Time = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("1PM",
"2PM", "3PM"), class = "factor")), .Names = c("Score", "Subject",
"Condition", "Time"), class = "data.frame", row.names = c(NA,
-24L))
See it:
library(ggplot2)
qplot(Time, Score, data = myDat, geom = "line", group = Subject, colour = factor(Condition))
(using lme4 library) This fits your subject effect as random and also the variable that your random effects are grouped under. In this model the random effect is the intercept varying by subject.
To see the random effects you can just use
As Ian Fellows mentioned, your data may also have random Condition and Time components. You can test that with another model. In the one below Condition, Time, and the intercept are allowed to vary randomly by subject. It also evaluates their correlations.
and try
You could also test for this without correlations with the intercept, with interactions between Condition and Time, and numerous other models to see which best fits your data and / or your theory. Your question is a bit vague but these few commands should get you started.
Note that Subject is your grouping factor so it's what you fit other effects as random under. It's not something you then explicitly fit as a predictor as well.
It's not an answer to your question, but you might find this visualisation of your data informative.
using the nlme library...
Answering your stated question, you can create a random intecept mixed effect model using the following code:
The intercept variance is basically 0, indicating no within subject effect, so this model is not capturing the between time relationship well. A random intercept model is rarely the type of model you want for a repeated measures design. A random intercept model assumes that the correlations between all time points are equal. i.e. the correlation between time 1 and time 2 is the same as between time 1 and time 3. Under normal circumstances (perhaps not those generating your fake data) we would expect the later to be less than the former. An auto regressive structure is usually a better way to go.
Your data is showing a -.596 between time point correlation, which seems odd. normally there should, at a minimum be a positive correlation between time points. How was this data generated?
addendum:
With your new data we know that the data generating process is equivalent to a random intercept model (though that is not the most realistic for a longitudinal study. The visualization shows that the effect of time seems to be fairly linear, so we should feel comfortable treating it as a numeric variable.
We see a significant Condition effect, indicating that the 'yes' condition tends to have higher scores (by about 1.7), and a significant time effect, indicating that both groups go up over time. Supporting the plot, we find no differential effect of time between the two groups (the interaction). i.e. the slopes are the same.