R reshape, restructure dataframe by chunks

2020-07-23 04:25发布

问题:

I am trying to reshape a dataframe:

Currently it looks like this:

ID   | Gender |A1 | A2 | A3 | B1 | B2 | B3
ID_1 | m      | 3 | 3  | 3  | 2  | 3  | 2 
ID_2 | f      | 1 | 1  | 1  | 4  | 4  | 4

I want to have something like:

 ID   | Gender | A1 | A2 | A3
 ID_1 | m      | 3  | 3  |  3   <- this would be columns A1 - A3 for ID 1
 ID_1 | m      | 2  | 2  |  2   <- this would be columns B1 - B3 for ID 1
 ID_2 | f      | 1  | 1  |  1   <- this would be columns A1 - A3 for ID 2
 ID_2 | f      | 4  | 4  |  4   <- this would be columns B1 - B3 for ID 2

(A1 and B1 / A2 and B2 are the same variables (with regard to the content), so for example: A1 and B1 would be both variables for the result of Test 1 and A2 and B2 both contain the result of Test 2. So in order to evaluate it I need all the result of Test1 in one column and all of Test 2 in another column. I tried to solve this with "melt", but it only melts down the dataframe one by one, not as chunks. (since I need to keep the first 2 columns the way they are and only rearrange the last 4 columns, but as chunks of three) Any other ideas? Thanks!

回答1:

One liner using reshape from base R.

reshape(dat, varying = 3:8, idvar = 1:2, direction = 'long', drop=FALSE, 
   timevar = 'Test')

           ID Gender Test Test1 Test2 Test3
ID_1.m.A ID_1      m    A    A1    A2    A3
ID_2.f.A ID_2      f    A    A1    A2    A3
ID_1.m.B ID_1      m    B    B1    B2    B3
ID_2.f.B ID_2      f    B    B1    B2    B3


回答2:

As @Andrie said, the first step is melting the data with your given columns (ID and gender). Your problem, as you say, is identifying what columns then "go together". Here is one approach, originally encoding that information in column names, and then pulling it out from there.

First some dummy data

dat <- data.frame(ID=c("ID_1", "ID_2"), Gender=c("m","f"), 
  Test1.A = "A1", Test2.A = "A2", Test3.A = "A3",
  Test1.B = "B1", Test2.B = "B2", Test3.B = "B3", stringsAsFactors=FALSE)

Note that I've named the columns with a name that systematically indicates which test and which group it is part of.

> dat
    ID Gender Test1.A Test2.A Test3.A Test1.B Test2.B Test3.B
1 ID_1      m      A1      A2      A3      B1      B2      B3
2 ID_2      f      A1      A2      A3      B1      B2      B3

Using the reshape2 package

library("reshape2")

Melt the data, and then take the variable column which has two pieces of information in it (test and group), and split those two bits of info into two separate columns.

dat.m <- melt(dat, id.vars=c("ID", "Gender"))
dat.m <- cbind(dat.m, colsplit(dat.m$variable, "\\.", names=c("Test", "Group")))

Now it is easy to cast since the test and the group are separate.

dcast(dat.m, ID+Gender+Group~Test)

Which gives

> dcast(dat.m, ID+Gender+Group~Test)
    ID Gender Group Test1 Test2 Test3
1 ID_1      m     A    A1    A2    A3
2 ID_1      m     B    B1    B2    B3
3 ID_2      f     A    A1    A2    A3
4 ID_2      f     B    B1    B2    B3


回答3:

I like Brian's answer better but here's a way to do it with the base package. Pretty ugly though in my opinion.

Your dataframe:

DF
        id    sex v1 v2 v3 v4 v5 v6
    1 ID_1   male A1 A2 A3 B1 B2 B3
    2 ID_2 female A1 A2 A3 B1 B2 B3

Code

DFa<-subset(DF, select=c(1:5))
DFb<-subset(DF, select=c(1:2, 6:8))
colnames(DFb)<-colnames(DFa)
DF<-as.data.frame(rbind(DFa,DFb))
rownames(DF)<-1:nrow(DF)
DF[order(DF$id),] 


回答4:

How about:

> dat <- data.frame(id=c("id1","id2"),gender=c("m","f"),a.1=1:2,a.2=1:2,a.3=1:2,b.1=3:4,b.2=3:4,b.3=3:4)
> dat1 <- dat[,-(3:5)]
> dat2 <- dat[,-(6:8)]
> names(dat1)[3:5] <- c("v1","v2","v3")
> names(dat2)[3:5] <- c("v1","v2","v3")
> 
> dat1$test <- "b"
> dat2$test <- "a"
> result <- rbind(dat1,dat2)
> dat
   id gender a.1 a.2 a.3 b.1 b.2 b.3
1 id1      m   1   1   1   3   3   3
2 id2      f   2   2   2   4   4   4
> result
   id gender v1 v2 v3 test
1 id1      m  3  3  3    b
2 id2      f  4  4  4    b
3 id1      m  1  1  1    a
4 id2      f  2  2  2    a