r reshape data long to wide with unknown number of

2019-05-05 16:56发布

问题:

I'm sure this is trivial but I can't find how to do it.

I have a data frame in which there are individuals, each of which can have several properties, and each property is classified in a number of ways. Currenly it's in long shape, with a record looking like (in schematic form, actually it's a little more complicated):

IndividualID Property PropClass 
1            X         A 
1            Y         B 
2            X         A 
3            Y         B
3            W         C
3            Z         A

What I want is one row for each individual ID, with the individual ID and then pairs of columns for each property and PropClass that that individual has on the original file, so in this case:

 IndividualID  Prop1   PropClass1 Prop2  PropClass2  Prop3  PropClass3
 1             X       A          Y      B           NA     NA
 2             X       A          NA     NA          NA     NA
 3             Y       B          W      C           Z      A

So there have to be as many Prop and PropClass variables as the maximum number of rows for any individualID in the original data set (which is not large, about 5), and where an individual has fewer rows in the original dataset than that maximum number, the extra columns that don't mean anything for that individual have NAs in them. The order of the Prop and PropClass variables for an individual doesn't matter (though it may as well be the original order on the long format file).

Obviously it's easy to do this (e.g. using reshape) if you have one pair of Prop and propClass columns for every possible value of Prop, but there are several hundred possible values of Prop so the file gets huge and unhelpful. I can't believe there is not a simple way to do what I want, but I haven't found it despite what seems to me to be assiduous searching. Please tell me I'm being an idiot, and if so, how I might cure my idiocy.

回答1:

There's probably a more efficient way to do this, but I can't think of it right now. With two variables that need to be transformed into wide format, I think you may need to cast them separately and then merge the two together. I'd love to be proved wrong though. To do this, I create two new variables which generate a column sequence for each new ID. This will allow them to be filled with NAs easily. With the new columns, it's pretty easy to cast them into the right format and merge them together.

library(plyr)
library(reshape2)

#Assumes your data is read into a variable named x
x <- ddply(x, "IndividualID", transform, 
      castPropClass = paste0("PropClass", seq(length(PropClass))),
      castProp = paste0("Prop", seq(length(Property))))

#Use these two new variables to cast into wide format. Wrap in merge to join together:
merge(dcast(IndividualID ~ castPropClass, value.var = "PropClass", data = x),
      dcast(IndividualID ~ castProp,      value.var = "Property",  data = x))
#Gives you this:
  IndividualID PropClass1 PropClass2 PropClass3 Prop1 Prop2 Prop3
1            1          A          B       <NA>     X     Y  <NA>
2            2          A       <NA>       <NA>     X  <NA>  <NA>
3            3          B          C          A     Y     W     Z

This obviously doesn't have the right "order" of columns, but the data itself is right.



回答2:

Would something like this be acceptable?

test.dt<-data.frame(id=(c(1,1,2,3,3,3)), property=(c("X","Y","X","Y","W","Z")), property.clss=(c("A","B","A","B","C","A")))
library(reshape)
m<-melt(data=test.dt, id.vars="id", measure.vars=c("property.clss"))
m
n<-melt(data=test.dt, id.vars="id", measure.vars=c("property"))
n
c1<-data.frame(cast(m, id~value))
colnames(c1)<-c("id", paste("property",colnames(c1)[colnames(c1)!="id"],sep=""))
c1
c2<-data.frame(cast(n,id~value))
colnames(c2)<-c("id", paste("property.clss",(colnames(c2)[colnames(c2)!="id"]),sep=""))
c2
merge(c1,c2,by="id")


标签: r reshape