Is it possible to use the aggregate function to add another column from the original data frame, without actually using that column to aggregate the data?
This is a very simplied version of data that will help illustrate my question (let's call it data)
name result.1 result.2 replicate day data.for.mean
"obj.1" 1 "good" 1 1 5
"obj.1" 1 "good" 2 1 7
"obj.1" 1 "great" 1 2 6
"obj.1" 1 "good" 2 2 9
"obj.1" 2 "bad" 1 1 10
"obj.1" 2 "not good" 2 1 6
"obj.1" 2 "bad" 1 2 5
"obj.1" 2 "not good" 2 2 3
"obj.2" 1 "excellent" 1 1 14
"obj.2" 1 "good" 2 1 10
"obj.2" 1 "good" 1 2 11
"obj.2" 1 "not bad" 2 2 7
"obj.2" 2 "bad" 1 1 4
"obj.2" 2 "bad" 2 1 3
"obj.2" 2 "horrible" 1 2 2
"obj.2" 2 "dismal" 2 2 1
You'll notice that result.1 and result.2 are tied, such that if result.1 == 1, result.2 is good/great, and if result.1 == 2, then result.2 == bad/not good. I need both of these columns in the aggregated data set and it doesn't matter which value from result.2 is picked when the data is aggregated, I just need the information to identify whether result.1 column's 1 value is good/bad and simiarly for result.2. So it could have all values of "dismal" corresponding with all of result.1's values of 2.
The problem is that, since result.2 uses different names to identify good/bad, I cannot use it to as a column to aggregate by.
Currently my aggregate function looks like this...
aggregated.data <- aggregate(data[c("data.for.mean")],
by=data[c("name", "result.1", "day") ],
FUN= mean }
);
which would giving one line of output such as this...
name result.1 day data.for.mean
"obj.1" 1 1 6
(All of the replicates for obj.1, with a result.1 == 1, on day1 have been averged. They had a value of 5 and 7 and were the first two rows in my mock data set.)
What I would like would produce a line of output such as this
name result.1 result.2 day data.for.mean
"obj.1" 1 "good" 1 6
Again, "good" could be replaced with "great", "not bad", "excellent", for all values which correspond to result.1's value of '1'.
What would be the best method of capturing information from result.2 and adding it to aggregated.data (the output of the aggregate function)?
Thank you.
Here's a solution in base, which uses
merge
followed by anotheraggregate
:Here's how this works:
The merge adds in the
result.2
values, but will create multiple rows where there are multiple such values. Thenaggregate
is used to select one of these rows.As you say you don't care which of the relevant
result.2
labels you get, I'm getting one at random withsample
.To return the first
result.2
label, usehead
withn=1
instead:Similarly, to get the last such label, use
tail
withn=1
.How about this with
dplyr
: