Collapse repeated columns into rows [duplicate]

2019-02-28 09:25发布

I have a dataframe that I pulled in from an API. After some cleaning it looks something like this:

Title   Year  Rating  Title    Year  Rating  Title    Year  Rating
Movie 1 1997  6.7     Movie 2  1987  8.2     Movie 3  2009  7.1

The column headers repeat, and in this case a single row contains 3 separate entries.

How would I reshape this so that I end up with 3 columns (Title, Year, Rating) and 3 rows (Movie 1, Movie 2, Movie 3)?

What's the simplest way of doing this?

3条回答
别忘想泡老子
2楼-- · 2019-02-28 09:28

I think if you got that data from the API your cleaning must have gone wrong somewhere. You've lost all information to figure out which rating and which title go with which movie besides column order.

But anyway, you can do it like this:

library(dplyr)
library(tidyr)

data %>%
  gather(variable, value) %>%
  mutate(ID = rep(1:3, length.out = n() ) ) %>%
  spread(variable, value)
查看更多
SAY GOODBYE
3楼-- · 2019-02-28 09:40

Convert the input data.frame to a list and split the columns into groups according to their common column names. Then unlist each group of columns to produce a single column in each group and convert back to a data.frame. (This also works if there is more than one row in DF.)

as.data.frame(lapply(split(as.list(DF), names(DF)), unlist))

giving:

  Rating  Title Year
1    6.7 Movie1 1997
2    8.2 Movie2 1987
3    7.1 Movie3 2009

Note: We assumed this input:

Lines <- "Title   Year  Rating  Title    Year  Rating  Title    Year  Rating
Movie1 1997  6.7     Movie2  1987  8.2     Movie3  2009  7.1"
DF <- read.table(text = Lines, header = TRUE, check.names = FALSE, as.is = TRUE)
查看更多
够拽才男人
4楼-- · 2019-02-28 09:49

This can be done with melt from data.table which can take multiple columns in the measure by specifying the pattern

library(data.table)#v1.9.6+
melt(setDT(df1), measure=patterns('Title', 'Year', 'Rating'), 
    value.name=c('Title', 'Year', 'Rating'))[,variable:=NULL][]
#     Title Year Rating
#1: Movie 1 1997    6.7
#2: Movie 2 1987    8.2
#3: Movie 3 2009    7.1

data

df1 <- structure(list(Title = "Movie 1", Year = 1997L, Rating = 6.7, 
Title = "Movie 2", Year = 1987L, Rating = 8.2, Title = "Movie 3", 
Year = 2009L, Rating = 7.1), .Names = c("Title", "Year", 
"Rating", "Title", "Year", "Rating", "Title", "Year", "Rating"
), class = "data.frame", row.names = c(NA, -1L))
查看更多
登录 后发表回答