Convert data frame of N columns into a data frame

2019-02-24 10:39发布

问题:

Hello Stack Community.

I am doing work with network analytics and have a data reshaping question.

My original data comes in as a series of columns each column being a "source" and "target" pair. The final data frame needs to be made up of two columns "source" and "target". Note these pairs are staggered as they source and targets are linked as in a directed network. (See the final_output in the code example for desired output)

I created a very hacky method producing the output I need (see code below) but it does not accommodate differing numbers of columns without me adding variables and whatnot. Also, please note in some cases the number of column pairs will be an odd number, i.e. one "source" with no "target" at the end of the data frame. In this case the missing "target" column is created with NAs.

I feel there is a smooth way to produce this without all the handwork. I have been searching and searching and have not come across anything. Thank you so much for your help.

Tim

# Create example DF
mydf <- data.frame(id = 1:6, varA = "A",
               varB = "B",
               varC = "C",
               varD = "D",
               varE = "E",
               varF = "F")
#Remove the ID value for DF build. This variable is not in real DF
mydf$id <-NULL

#Begin inelegant hack. 
#Please note: the incoming DF has an indeterminate number of columns that vary with project

counter <-ncol(mydf)
   for (i in 1:counter){
   t1 <-mydf[(counter-counter+1):(counter-counter+2)] 
   t2 <-mydf[(counter-counter+2):(counter-counter+3)]
   t3 <-mydf[(counter-counter+3):(counter-counter+4)]
   t4 <-mydf[(counter-counter+4):(counter-counter+5)]
   t5 <-mydf[(counter-counter+5):(counter-counter+6)]
    }

#Rename for the rbind
names(t1) <-c("Source", "Target")
names(t2) <-c("Source", "Target")
names(t3) <-c("Source", "Target")
names(t4) <-c("Source", "Target")
names(t5) <-c("Source", "Target")

#This is the shape I need but the process is super manual and does not accommodate differing numbers of columns.
final_output <-rbind(t1,t2,t3,t4,t5)

回答1:

If I understand correctly, you can just use unlist and manually create your data.frame:

mydf[] <- lapply(mydf, as.character)  # Convert factors to characters
final_output <- data.frame(Source = unlist(mydf[-length(mydf)]), 
                           Target = unlist(mydf[-1]))
head(final_output, 15)
#       Source Target
# varA1      A      B
# varA2      A      B
# varA3      A      B
# varA4      A      B
# varA5      A      B
# varA6      A      B
# varB1      B      C
# varB2      B      C
# varB3      B      C
# varB4      B      C
# varB5      B      C
# varB6      B      C
# varC1      C      D
# varC2      C      D
# varC3      C      D