Collapse repeated columns into rows [duplicate]

2019-02-28 09:48发布

问题:

This question already has an answer here:

  • Reshaping multiple sets of measurement columns (wide format) into single columns (long format) 6 answers

I have a dataframe that I pulled in from an API. After some cleaning it looks something like this:

Title   Year  Rating  Title    Year  Rating  Title    Year  Rating
Movie 1 1997  6.7     Movie 2  1987  8.2     Movie 3  2009  7.1

The column headers repeat, and in this case a single row contains 3 separate entries.

How would I reshape this so that I end up with 3 columns (Title, Year, Rating) and 3 rows (Movie 1, Movie 2, Movie 3)?

What's the simplest way of doing this?

回答1:

Convert the input data.frame to a list and split the columns into groups according to their common column names. Then unlist each group of columns to produce a single column in each group and convert back to a data.frame. (This also works if there is more than one row in DF.)

as.data.frame(lapply(split(as.list(DF), names(DF)), unlist))

giving:

  Rating  Title Year
1    6.7 Movie1 1997
2    8.2 Movie2 1987
3    7.1 Movie3 2009

Note: We assumed this input:

Lines <- "Title   Year  Rating  Title    Year  Rating  Title    Year  Rating
Movie1 1997  6.7     Movie2  1987  8.2     Movie3  2009  7.1"
DF <- read.table(text = Lines, header = TRUE, check.names = FALSE, as.is = TRUE)


回答2:

I think if you got that data from the API your cleaning must have gone wrong somewhere. You've lost all information to figure out which rating and which title go with which movie besides column order.

But anyway, you can do it like this:

library(dplyr)
library(tidyr)

data %>%
  gather(variable, value) %>%
  mutate(ID = rep(1:3, length.out = n() ) ) %>%
  spread(variable, value)


回答3:

This can be done with melt from data.table which can take multiple columns in the measure by specifying the pattern

library(data.table)#v1.9.6+
melt(setDT(df1), measure=patterns('Title', 'Year', 'Rating'), 
    value.name=c('Title', 'Year', 'Rating'))[,variable:=NULL][]
#     Title Year Rating
#1: Movie 1 1997    6.7
#2: Movie 2 1987    8.2
#3: Movie 3 2009    7.1

data

df1 <- structure(list(Title = "Movie 1", Year = 1997L, Rating = 6.7, 
Title = "Movie 2", Year = 1987L, Rating = 8.2, Title = "Movie 3", 
Year = 2009L, Rating = 7.1), .Names = c("Title", "Year", 
"Rating", "Title", "Year", "Rating", "Title", "Year", "Rating"
), class = "data.frame", row.names = c(NA, -1L))