I often encounter data that looks like this:
#create dummy data frame
data <- as.data.frame(diag(4))
data[data==0] <- NA
data[2,2] <- NA
data
#V1 V2 V3 V4
#1 1 NA NA NA
#2 NA NA NA NA
#3 NA NA 1 NA
#4 NA NA NA 1
Rows represent participants and columns V1 through V4 represent the condition that the participant is in (e.g., a 1 under V1 means this participant is in condition 1, a 1 under V4 means this participant is in condition 4). Sidenote: The data are not symmetric, so there are a lot more participants spread over the 4 conditions.
What I want is a vector with the condition for each participant:
1 NA 3 4
I wrote the following bit, but was wondering if there was a more efficient way (i.e., using fewer lines of code)?
#replace entries with condition numbers
cond <- data + matrix(rep(0:3, 4), 4, byrow=TRUE) #add 0 to 1 for condition 1...
#get all unique elements (ignore NAs)
cond <- apply(cond, 1, function(x)unique(x[!is.na(x)]))
#because I ignored NAs just now, cond[2,2] is numeric(0)
#assign NA to all values that are numeric(0)
cond[sapply(cond, function(x) length(x)==0)] <- NA
cond <- unlist(cond)
cond
#[1] 1 NA 3 4
We can use max.col
with ties.method='first'
on the logical matrix of non-NA elements in 'data'. To make the rows that have only NA elements as NA, we multiply the max.col
index with rowSums
of logical matrix with 0 non-NA rows converted to NA (NA^
).
max.col(!is.na(data), 'first')* NA^!rowSums(!is.na(data))
#[1] 1 NA 3 4
Or another option is pmax
. We multiply the column index with the data so that the non-NA elements get replaced by the index. Then, use pmax
with na.rm=TRUE
and get the max value per each row.
do.call(pmax, c(col(data)*data, na.rm=TRUE))
#[1] 1 NA 3 4
Using the reshape2
package:
> data$ID <- rownames(data)
> melt(data, 'ID', na.rm=TRUE)
ID variable value
1 1 V1 1
11 3 V3 1
16 4 V4 1
IMHO, this has the advantage of keeping the ID variable along with the treatment factor; also if you have a response measurement it comes along too in the value column.
EDIT:
If you want to include the subject under no conditions, you can reconstruct that indicator variable explicitly:
data$VNA <- ifelse(apply(is.na(data), 1, all), 1, NA)
Less clever and efficient than other solutions, but perhaps more readable?
apply(data,
MARGIN = 1,
FUN = function(x) {
if(all(is.na(x))) return(NA)
return(which(!is.na(x)))
}
)
# [1] 1 NA 3 4