R how to calculate relative values based on a long

2019-08-15 20:09发布

问题:

I have a dataframe like this and I would like to add a column gene_richness_relative. In this column, the gene_richness value at days == 0 should be set to 100 % as the basis for calculation. The relative values at other days should then reflect the changes

I start with a data.frame sorted after days:

str(df)
'data.frame':   584 obs. of  5 variables:
 $ gene         : Factor w/ 64 levels "araD","arfA",..: 1 2 3 4 8 9 10 11 12 13 ...
 $ sample       : Factor w/ 11 levels "","A1","A2","A3",..: 10 10 10 10 10 10 10 10 10 10 ...
 $ days         : num  0 0 0 0 0 0 0 0 0 0 ...
 $ treatment    : Factor w/ 2 levels "control","glyph": 1 1 1 1 1 1 1 1 1 1 ...
 $ gene_richness: int  6 11 9 3 20 7 2 28 38 9 ...

looking like this:

  gene sample days treatment gene_richness
1  araD     B8    0   control      6
2  arfA     B8    0   control     11
3  artI     B8    0   control      9
4  bcsZ     B8    0   control      3
5  czcD     B8    0   control     20
6  fdhA     B8    0   control      7
7   fdm     B8    0   control      2
8  gyrA     B8    0   control     28
9  gyrB     B8    0   control     38
10 katE     B8    0   control      9
11 merA     B8    0   control     15
12 mlhB     B8    0   control      6
13 mntB     B8    0   control     11
14 nirS     B8    0   control     10
15 norB     B8    0   control      9
16 nosZ     B8    0   control      7
17 nuoF     B8    0   control     16
18 phnA     B8    0   control      2
19 phnC     B8    0   control     13
20 phnD     B8    0   control     19
21 phnE     B8    0   control     36
22 phnF     B8    0   control      8
23 phnG     B8    0   control     11
24 phnH     B8    0   control     13
25 phnI     B8    0   control     17
26 phnJ     B8    0   control     15
27 phnK     B8    0   control     13
28 phnL     B8    0   control     13
29 phnM     B8    0   control     19
30 phnN     B8    0   control      8

by applying:

df2 <- df[with(df, order(gene)), ]

I receive this output

'data.frame':   584 obs. of  5 variables:
 $ gene         : Factor w/ 64 levels "araD","arfA",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ sample       : Factor w/ 11 levels "","A1","A2","A3",..: 10 11 9 2 3 4 5 6 7 8 ...
 $ days         : num  0 22 71 0 3 7 14 22 43 71 ...
 $ treatment    : Factor w/ 2 levels "control","glyph": 1 1 1 2 2 2 2 2 2 2 ...
 $ gene_richness: int  6 5 5 7 7 7 8 8 6 7 ...

looking like this:

    gene sample days treatment gene_richness
1   araD     B8    0   control             6
59  araD     B9   22   control             5
117 araD    B10   71   control             5
174 araD     A1    0     glyph             7
230 araD     A2    3     glyph             7
289 araD     A3    7     glyph             7
347 araD     A4   14     glyph             8
407 araD     A5   22     glyph             8
466 araD     A6   43     glyph             6
526 araD     A7   71     glyph             7
2   arfA     B8    0   control            11
60  arfA     B9   22   control             4
118 arfA    B10   71   control             4
175 arfA     A1    0     glyph             6
231 arfA     A2    3     glyph             8
290 arfA     A3    7     glyph            10
348 arfA     A4   14     glyph            11
408 arfA     A5   22     glyph             9
467 arfA     A6   43     glyph             6
527 arfA     A7   71     glyph             5
3   artI     B8    0   control             9
61  artI     B9   22   control             8
119 artI    B10   71   control             9
176 artI     A1    0     glyph             4
232 artI     A2    3     glyph             5
291 artI     A3    7     glyph             5
349 artI     A4   14     glyph             9
409 artI     A5   22     glyph             7
468 artI     A6   43     glyph            10
528 artI     A7   71     glyph            15

desired output looks like this, which works perfectly with

library(data.table)
df2 <- setDT(df2)
df2[,gene_richness_relative := gene_richness/gene_richness[days == 0]*100, by = .(gene,treatment)]

from denis' answer.

     gene sample days treatment gene_richness gene_richness_relative
  1: araD     B8    0   control             6              100.00000
  2: araD     B9   22   control             5               83.33333
  3: araD    B10   71   control             5               83.33333
  4: araD     A1    0     glyph             7              100.00000
  5: araD     A2    3     glyph             7              100.00000
 ---                                                                
580: ydiF     A3    7     glyph             3              100.00000
581: ydiF     A4   14     glyph             2               66.66667
582: ydiF     A5   22     glyph             5              166.66667
583: ydiF     A6   43     glyph             4              133.33333
584: ydiF     A7   71     glyph             4              133.33333

But

library(dplyr)
df %>%
  group_by(gene,treatment) %>%
  mutate(gene_richness_relative = gene_richness/gene_richness[days == 0]*100)

returns

Fehler in mutate_impl(.data, dots) : 
  Column `gene_richness_relative` must be length 2 (the group size) or one, not 0

I'm actually quite happy as the data.table way works, but do you have an idea what the problem with dplyr is?

回答1:

library(dplyr)
df %>%
  group_by(gene,treatment) %>%
  mutate(gene_richness_relative = gene_richness/gene_richness[days == 0]*100)

# A tibble: 20 x 6
# Groups:   gene, treatment [4]
     gene sample  days treatment gene_richness gene_richness_relative
   <fctr> <fctr> <int>    <fctr>         <int>                  <dbl>
 1   araD     B8     0   control             6              100.00000
 2   araD     B9    22   control             5               83.33333
 3   araD    B10    71   control             5               83.33333
 4   araD     A1     0   treated             7              100.00000
 5   araD     A2     3   treated             7              100.00000
 6   araD     A3     7   treated             7              100.00000
 7   araD     A4    14   treated             8              114.28571
 8   araD     A5    22   treated             8              114.28571

or with data.table

library(data.table)
df <- setDT(df)
df[,gene_richness_relative := gene_richness/gene_richness[days == 0]*100, by = .(gene,treatment)]