I have a data frame with 900,000 rows and 11 columns in R. The column names and types are as follows:
column name: date / mcode / mname / ycode / yname / yissue / bsent / breturn / tsent / treturn / csales
type: Date / Char / Char / Char / Char / Numeric / Numeric / Numeric / Numeric / Numeric / Numeric
I want to calculate the subtotals. For example, I want to calculate the sums at each change in yname, and add subtotal to all numerical variables. There are 160 distinct ynames, so the resulting table should tell me the subtotal of each yname. I haven't sorted the data yet, but this is not a problem because I can sort the data in whatever way I want. Below is a snippet from my data:
date mcode mname ycode yname yissue bsent breturn tsent treturn csales
417572 2010-07-28 45740 ENDPOINT A 5772 XMAG 20100800 7 0 7 0 0
417573 2010-07-31 45740 ENDPOINT A 5772 XMAG 20100800 0 0 0 0 1
417574 2010-08-04 45740 ENDPOINT A 5772 XMAG 20100800 0 0 0 0 1
417575 2010-08-14 45740 ENDPOINT A 5772 XMAG 20100800 0 0 0 0 1
417576 2010-08-26 45740 ENDPOINT A 5772 XMAG 20100800 0 4 0 0 0
417577 2010-07-28 45741 ENDPOINT L 5772 XMAG 20100800 2 0 2 0 0
417578 2010-08-04 45741 ENDPOINT L 5772 XMAG 20100800 2 0 2 0 0
417579 2010-08-26 45741 ENDPOINT L 5772 XMAG 20100800 0 4 0 0 0
417580 2010-07-28 46390 ENDPOINT R 5772 XMAG 20100800 3 0 3 0 1
417581 2010-07-29 46390 ENDPOINT R 5772 XMAG 20100800 0 0 0 0 2
417582 2010-08-01 46390 ENDPOINT R 5779 YMAG 20100800 3 0 3 0 0
417583 2010-08-11 46390 ENDPOINT R 5779 YMAG 20100800 0 0 0 0 1
417584 2010-08-20 46390 ENDPOINT R 5779 YMAG 20100800 0 0 0 0 1
417585 2010-08-24 46390 ENDPOINT R 5779 YMAG 20100800 2 0 2 0 1
417586 2010-08-26 46390 ENDPOINT R 5779 YMAG 20100800 0 2 0 2 0
417587 2010-07-28 46411 ENDPOINT D 5779 YMAG 20100800 6 0 6 0 0
417588 2010-08-08 46411 ENDPOINT D 5779 YMAG 20100800 0 0 0 0 1
417589 2010-08-11 46411 ENDPOINT D 5779 YMAG 20100800 0 0 0 0 1
417590 2010-08-26 46411 ENDPOINT D 5779 YMAG 20100800 0 4 0 4 0
What function should I use here? Maybe something like SQL group by
?
Google wasn't super helpful when I tried to find an answer to a similar question. I thought I would share my solution below using the
library(janitor)
package withsplit()
, andpurrr::map_df()
.My use case was to run a script that would grab CC expenses from many different people to be reviewed by a person.