R - reshaping 2 column data frame to multiple colu

2019-03-06 07:26发布

问题:

This question already has an answer here:

  • Combine data in many row into a columnn 3 answers

Apologies if this question is a repeat but I couldn't find it.

I am looking to reshape a data frame in the form (read in from read_bulk):

"name.a", 5
"name.a", 4
"name.a", 1
"name.b", 2
"name.b", 3
"name.b", 2
"name.c", 1
"name.c", 5
"name.c", 6

into the form:

5, 4, 1
2, 3, 2
1, 5, 6

The real data frame consists of thousands of numbers for each name, I do not know the number in each but they are all equal. Each name is to be a different row in the final form.

I attempted this with reshape but couldn't seem to get this to work, any thoughts?

回答1:

 unstack(dat,V2~V1)
  name.a name.b name.c
1      5      2      1
2      4      3      5
3      1      2      6

using other libraries:

library(tidyverse)
dat%>%group_by(V1)%>%mutate(id2=1:n())%>%spread(id2,V2)
# A tibble: 3 x 4
# Groups:   V1 [3]
      V1   `1`   `2`   `3`
*  <chr> <int> <int> <int>
1 name.a     5     4     1
2 name.b     2     3     2
3 name.c     1     5     6

data:

dat=read.table(h=F,sep=",",stringsAsFactors = F,strip.white = T,text=' "name.a", 5
"name.a", 4
               "name.a", 1
               "name.b", 2
               "name.b", 3
               "name.b", 2
               "name.c", 1
               "name.c", 5
               "name.c", 6')


回答2:

Provided the format is always the same, something like this in base R?

df <- as.data.frame(matrix(unlist(df[, 2]), ncol = 3, byrow = T));
df;
#  V1 V2 V3
#1  5  4  1
#2  2  3  2
#3  1  5  6

Explanation: unlist(df[, 2]) turns the entries in df[, 2] into a vector, then reformat into a matrix with ncol = 3 columns, and finally cast into data.frame.


Sample data

df <- read.table(text =
    "name.a 5
name.a 4
name.a 1
name.b 2
name.b 3
name.b 2
name.c 1
name.c 5
name.c 6")


回答3:

You can do the transformation using dplyr and reshape2:

df <- data.frame(name=c("name.a",
                        "name.a",
                        "name.a",
                        "name.b",
                        "name.b",
                        "name.b",
                        "name.c",
                        "name.c",
                        "name.c"),
                 num=c(5,
                         4,
                         1,
                         2,
                         3,
                         2,
                         1,
                         5,
                         6))

df <- df %>%
  group_by(name) %>%
  mutate(instance = 1:n())

dcast(df,name~instance,sum,value.var='num')


回答4:

After looking through the responses I think I found a quicker way (simpler). Using the data I managed to get it to work with:

setwd("~/Documents/Random/abs") # data here
a = read_bulk(directory = ".") # read in as i did
df = unstack(a) # line i was looking for
dat = as.matrix(df) # to matrix
matplot(dat, lty = 1, type = 'l', lwd = 1, xlab = "Energy (keV)", ylab = "Counts") # plot