I am trying to create a list of factors that have a binary response and have been using cast.
DF2 <- cast(data.frame(DM), id ~ region)
names(DF2)[-1] <- paste("region", names(DF2)[-1], sep = "")
The problem I am getting is that the responses are the frequency of which the answer comes up, while I'm looking for just whether or not it matches.
For example I have:
id region
1 2
1 3
2 2
3 1
3 1
What I'd like is:
id region1 region2 region3
1 0 1 1
2 0 1 0
3 1 0 0
Original data:
x <- data.frame(id=c(1,1,2,3,3), region=factor(c(2,3,2,1,1)))
> x
id region
1 1 2
2 1 3
3 2 2
4 3 1
5 3 1
Group up the data:
aggregate(model.matrix(~ region - 1, data=x), x["id"], max)
Result:
id region1 region2 region3
1 1 0 1 1
2 2 0 1 0
3 3 1 0 0
I kind of prefer dcast
from reshape2:
library(reshape2)
dat <- read.table(text = "id region
1 2
1 3
2 2
3 1
3 1",header = TRUE,sep = "")
dcast(dat,id~region,fun.aggregate = function(x){as.integer(length(x) > 0)})
id 1 2 3
1 1 0 1 1
2 2 0 1 0
3 3 1 0 0
There may be a smoother way to do that, but I'll be honest I don't cast stuff all that often.
Here's sort of a "tricky" way to do it in one line using table
(the brackets are important). Assuming your data.frame
is named df
:
(table(df) > 0)+0
# region
# id 1 2 3
# 1 0 1 1
# 2 0 1 0
# 3 1 0 0
table(df) > 0
gives us TRUE
and FALSE
; adding +0
converts the TRUE
and FALSE
to numbers.
No specialized functions are needed:
x <- data.frame(id=1:4, region=factor(c(3,2,1,2)))
x
id region
1 1 3
2 2 2
3 3 1
4 4 2
x.bin <- data.frame(x$id, sapply(levels(x$region), `==`, x$region))
names(x.bin) <- c("id", paste("region", levels(x$region),sep=''))
x.bin
id region1 region2 region3
1 1 FALSE FALSE TRUE
2 2 FALSE TRUE FALSE
3 3 TRUE FALSE FALSE
4 4 FALSE TRUE FALSE
Or for integer results:
x.bin2 <- data.frame(x$id,
apply(sapply(levels(x$region), `==`, x$region),2,as.integer)
)
names(x.bin2) <- c("id", paste("region", levels(x$region),sep=''))
x.bin2
id region1 region2 region3
1 1 0 0 1
2 2 0 1 0
3 3 1 0 0
4 4 0 1 0