I have a dataframe like this and I would like to add a column gene_richness_relative
. In this column, the gene_richness
value at days == 0
should be set to 100 % as the basis for calculation. The relative values at other days should then reflect the changes
I start with a data.frame sorted after days:
str(df)
'data.frame': 584 obs. of 5 variables:
$ gene : Factor w/ 64 levels "araD","arfA",..: 1 2 3 4 8 9 10 11 12 13 ...
$ sample : Factor w/ 11 levels "","A1","A2","A3",..: 10 10 10 10 10 10 10 10 10 10 ...
$ days : num 0 0 0 0 0 0 0 0 0 0 ...
$ treatment : Factor w/ 2 levels "control","glyph": 1 1 1 1 1 1 1 1 1 1 ...
$ gene_richness: int 6 11 9 3 20 7 2 28 38 9 ...
looking like this:
gene sample days treatment gene_richness
1 araD B8 0 control 6
2 arfA B8 0 control 11
3 artI B8 0 control 9
4 bcsZ B8 0 control 3
5 czcD B8 0 control 20
6 fdhA B8 0 control 7
7 fdm B8 0 control 2
8 gyrA B8 0 control 28
9 gyrB B8 0 control 38
10 katE B8 0 control 9
11 merA B8 0 control 15
12 mlhB B8 0 control 6
13 mntB B8 0 control 11
14 nirS B8 0 control 10
15 norB B8 0 control 9
16 nosZ B8 0 control 7
17 nuoF B8 0 control 16
18 phnA B8 0 control 2
19 phnC B8 0 control 13
20 phnD B8 0 control 19
21 phnE B8 0 control 36
22 phnF B8 0 control 8
23 phnG B8 0 control 11
24 phnH B8 0 control 13
25 phnI B8 0 control 17
26 phnJ B8 0 control 15
27 phnK B8 0 control 13
28 phnL B8 0 control 13
29 phnM B8 0 control 19
30 phnN B8 0 control 8
by applying:
df2 <- df[with(df, order(gene)), ]
I receive this output
'data.frame': 584 obs. of 5 variables:
$ gene : Factor w/ 64 levels "araD","arfA",..: 1 1 1 1 1 1 1 1 1 1 ...
$ sample : Factor w/ 11 levels "","A1","A2","A3",..: 10 11 9 2 3 4 5 6 7 8 ...
$ days : num 0 22 71 0 3 7 14 22 43 71 ...
$ treatment : Factor w/ 2 levels "control","glyph": 1 1 1 2 2 2 2 2 2 2 ...
$ gene_richness: int 6 5 5 7 7 7 8 8 6 7 ...
looking like this:
gene sample days treatment gene_richness
1 araD B8 0 control 6
59 araD B9 22 control 5
117 araD B10 71 control 5
174 araD A1 0 glyph 7
230 araD A2 3 glyph 7
289 araD A3 7 glyph 7
347 araD A4 14 glyph 8
407 araD A5 22 glyph 8
466 araD A6 43 glyph 6
526 araD A7 71 glyph 7
2 arfA B8 0 control 11
60 arfA B9 22 control 4
118 arfA B10 71 control 4
175 arfA A1 0 glyph 6
231 arfA A2 3 glyph 8
290 arfA A3 7 glyph 10
348 arfA A4 14 glyph 11
408 arfA A5 22 glyph 9
467 arfA A6 43 glyph 6
527 arfA A7 71 glyph 5
3 artI B8 0 control 9
61 artI B9 22 control 8
119 artI B10 71 control 9
176 artI A1 0 glyph 4
232 artI A2 3 glyph 5
291 artI A3 7 glyph 5
349 artI A4 14 glyph 9
409 artI A5 22 glyph 7
468 artI A6 43 glyph 10
528 artI A7 71 glyph 15
desired output looks like this, which works perfectly with
library(data.table)
df2 <- setDT(df2)
df2[,gene_richness_relative := gene_richness/gene_richness[days == 0]*100, by = .(gene,treatment)]
from denis' answer.
gene sample days treatment gene_richness gene_richness_relative
1: araD B8 0 control 6 100.00000
2: araD B9 22 control 5 83.33333
3: araD B10 71 control 5 83.33333
4: araD A1 0 glyph 7 100.00000
5: araD A2 3 glyph 7 100.00000
---
580: ydiF A3 7 glyph 3 100.00000
581: ydiF A4 14 glyph 2 66.66667
582: ydiF A5 22 glyph 5 166.66667
583: ydiF A6 43 glyph 4 133.33333
584: ydiF A7 71 glyph 4 133.33333
But
library(dplyr)
df %>%
group_by(gene,treatment) %>%
mutate(gene_richness_relative = gene_richness/gene_richness[days == 0]*100)
returns
Fehler in mutate_impl(.data, dots) :
Column `gene_richness_relative` must be length 2 (the group size) or one, not 0
I'm actually quite happy as the data.table way works, but do you have an idea what the problem with dplyr is?