This question already has an answer here:
Suppose that I have this input :
ID date_1 date_2 str
1 1 2010-07-04 2008-01-20 A
2 2 2015-07-01 2011-08-31 C
3 3 2015-03-06 2013-01-18 D
4 4 2013-01-10 2011-08-30 D
5 5 2014-06-04 2011-09-18 B
6 5 2014-06-04 2011-09-18 B
7 6 2012-11-22 2011-09-28 C
8 7 2014-06-17 2013-08-04 A
10 7 2014-06-17 2013-08-04 B
11 7 2014-06-17 2013-08-04 B
I would like to progressively concatenate the values of the str
column by the group variable ID
, as showed in the following output :
ID date_1 date_2 str
1 1 2010-07-04 2008-01-20 A
2 2 2015-07-01 2011-08-31 C
3 3 2015-03-06 2013-01-18 D
4 4 2013-01-10 2011-08-30 D
5 5 2014-06-04 2011-09-18 B
6 5 2014-06-04 2011-09-18 B,B
7 6 2012-11-22 2011-09-28 C
8 7 2014-06-17 2013-08-04 A
10 7 2014-06-17 2013-08-04 A,B
11 7 2014-06-17 2013-08-04 A,B,B
I tried to use the ave()
function with this code :
within(table, {
Emp_list <- ave(str, ID, FUN = function(x) paste(x, collapse = ","))
})
but it gives the following output, which is not exactly what I want :
ID date_1 date_2 str
1 1 2010-07-04 2008-01-20 A
2 2 2015-07-01 2011-08-31 C
3 3 2015-03-06 2013-01-18 D
4 4 2013-01-10 2011-08-30 D
5 5 2014-06-04 2011-09-18 B,B
6 5 2014-06-04 2011-09-18 B,B
7 6 2012-11-22 2011-09-28 C
8 7 2014-06-17 2013-08-04 A,B,B
10 7 2014-06-17 2013-08-04 A,B,B
11 7 2014-06-17 2013-08-04 A,B,B
Of course I'd like to avoid loops, as I work on a large database.
How about
ave()
withReduce()
. TheReduce()
function allows us to accumulate results as they are calculated. So if we run it withpaste()
we can accumulate the pasted strings.which gives the updated data frame
df
Note:
function(...) paste(..., sep = ", ")
could also befunction(x, y) paste(x, y, sep = ", ")
. (Thanks Pierre Lafortune)Here's a possible solution combining
data.table
with an innertapply
that seem to get you what you need (you can usepaste
instead oftoString
if you like, it just looks cleaner to me that way).You may be able to improve it a bit using