I have a big dataset, with 240 cases representing 240 patients. They all have undergone neuropsychological tests and filled in questionnaires. Additionally, their significant others (hereafter: proxies) have also filled in questionnaires. Since 'patient' and 'proxy' are nested in 'couples', I want to conduct a multilevel analysis in R. For this, I need to reshape my dataset to run those kind of analysis.
Simply said, I want to 'duplicate' my rows. For the double subject IDs add a new variable with 1s and 2s, where 1 stands for patient data and 2 stands for proxy data. Then I want the rows to be filled with 1. all the patient data and the columns that contain the proxy data to be NA or empty or whatever, and 2. all the proxy data, and all the patient data NA or empty.
Let's say this is my data:
id <- c(1:5)
names <- c('id', 'p1', 'p2', 'p3', 'pr1', 'pr2', 'pr3')
p1 <- c(sample(1:10, 5))
p2 <- c(sample(10:20, 5))
p3 <- c(sample(20:30, 5))
pr1 <- c(sample(1:10, 5))
pr2 <- c(sample(10:20, 5))
pr3 <- c(sample(20:30, 5))
mydf <- as.data.frame(matrix(c(id, p1, p2, p3, pr1, pr2, pr3), nrow = 5))
colnames(mydf) <- names
>mydf
id p1 p2 p3 pr1 pr2 pr3
1 1 6 20 22 1 10 24
2 2 8 11 24 2 18 29
3 3 7 10 25 6 20 26
4 4 3 14 20 10 15 20
5 5 5 19 29 7 14 22
I want my data finally to look like this:
id2 <- rep(c(1:5), each = 2)
names2 <- c('id', 'couple', 'q1', 'q2', 'q3')
couple <- rep(1:2, 5)
p1 <- c(sample(1:10, 5))
p2 <- c(sample(10:20, 5))
p3 <- c(sample(20:30, 5))
pr1 <- c(sample(1:10, 5))
pr2 <- c(sample(10:20, 5))
pr3 <- c(sample(20:30, 5))
mydf <- as.data.frame(matrix(c(id2, couple, p1, p2, p3, pr1, pr2, pr3), nrow = 10, ncol = 5))
colnames(mydf) <- names2
>mydf
id couple q1 q2 q3
1 1 1 6 23 16
2 1 2 10 28 10
3 2 1 1 27 14
4 2 2 7 21 20
5 3 1 5 30 18
6 3 2 12 2 27
7 4 1 10 1 25
8 4 2 13 7 21
9 5 1 11 6 20
10 5 2 18 3 23
Or, if this is not possible, like this:
id couple bb1 bb2 bb3 pbb1 pbb2 pbb3
1 1 1 6 23 16
2 1 2 10 28 10
3 2 1 1 27 14
4 2 2 7 21 20
5 3 1 5 30 18
6 3 2 12 2 27
7 4 1 10 1 25
8 4 2 13 7 21
9 5 1 11 6 20
10 5 2 18 3 23
Now, to get me there, i've tried the melt() function and the gather() function and it feels like i'm close but still it's not working the way I want it to work.
note, in my dataset the variable names are bb1:bb54 for the patient questionnaire and pbb1:pbb54 for the proxy questionnaire
Example of what I've tried
df_long <- df_reshape %>%
gather(testname, value, -(bb1:bb11), -(pbb1:pbb11), -id, -pgebdat, -p_age, na.rm=T) %>%
arrange(id)