This question already has an answer here:
- How to sum a variable by group? 13 answers
I have a data frame DF.
Say DF is:
A B
1 1 2
2 1 3
3 2 3
4 3 5
5 3 6
Now I want to combine together the rows by the column A and to have the sum of the column B.
For example:
A B
1 1 5
2 2 3
3 3 11
I am doing this currently using an SQL query with the sqldf function. But for some reason it is very slow. Is there any more convenient way to do that? I could do it manually too using a for loop but it is again slow. My SQL query is " Select A,Count(B) from DF group by A".
In general whenever I don't use vectorized operations and I use for loops the performance is extremely slow even for single procedures.
This is a common question. In base, the option you're looking for is
aggregate
. Assuming yourdata.frame
is called "mydf", you can use the following.I would also recommend looking into the "data.table" package.
I would recommend having a look at the
plyr
package. It might not be as fast as data.table or other packages, but it is quite instructive, especially when starting with R and having to do some data manipulation.I am not certain the exact advantages over aggregate.
Using
dplyr
:With
sqldf
: