This question already has an answer here:
-
Reshaping multiple sets of measurement columns (wide format) into single columns (long format)
6 answers
I have a dataframe that I pulled in from an API. After some cleaning it looks something like this:
Title Year Rating Title Year Rating Title Year Rating
Movie 1 1997 6.7 Movie 2 1987 8.2 Movie 3 2009 7.1
The column headers repeat, and in this case a single row contains 3 separate entries.
How would I reshape this so that I end up with 3 columns (Title, Year, Rating) and 3 rows (Movie 1, Movie 2, Movie 3)?
What's the simplest way of doing this?
Convert the input data.frame to a list and split the columns into groups according to their common column names. Then unlist each group of columns to produce a single column in each group and convert back to a data.frame. (This also works if there is more than one row in DF
.)
as.data.frame(lapply(split(as.list(DF), names(DF)), unlist))
giving:
Rating Title Year
1 6.7 Movie1 1997
2 8.2 Movie2 1987
3 7.1 Movie3 2009
Note: We assumed this input:
Lines <- "Title Year Rating Title Year Rating Title Year Rating
Movie1 1997 6.7 Movie2 1987 8.2 Movie3 2009 7.1"
DF <- read.table(text = Lines, header = TRUE, check.names = FALSE, as.is = TRUE)
I think if you got that data from the API your cleaning must have gone wrong somewhere. You've lost all information to figure out which rating and which title go with which movie besides column order.
But anyway, you can do it like this:
library(dplyr)
library(tidyr)
data %>%
gather(variable, value) %>%
mutate(ID = rep(1:3, length.out = n() ) ) %>%
spread(variable, value)
This can be done with melt
from data.table
which can take multiple columns in the measure
by specifying the pattern
library(data.table)#v1.9.6+
melt(setDT(df1), measure=patterns('Title', 'Year', 'Rating'),
value.name=c('Title', 'Year', 'Rating'))[,variable:=NULL][]
# Title Year Rating
#1: Movie 1 1997 6.7
#2: Movie 2 1987 8.2
#3: Movie 3 2009 7.1
data
df1 <- structure(list(Title = "Movie 1", Year = 1997L, Rating = 6.7,
Title = "Movie 2", Year = 1987L, Rating = 8.2, Title = "Movie 3",
Year = 2009L, Rating = 7.1), .Names = c("Title", "Year",
"Rating", "Title", "Year", "Rating", "Title", "Year", "Rating"
), class = "data.frame", row.names = c(NA, -1L))