Calculating subtotals in R

2020-06-17 04:29发布

I have a data frame with 900,000 rows and 11 columns in R. The column names and types are as follows:

column name: date / mcode / mname / ycode / yname / yissue  / bsent   / breturn / tsent   / treturn / csales
type:        Date / Char  / Char  / Char  / Char  / Numeric / Numeric / Numeric / Numeric / Numeric / Numeric

I want to calculate the subtotals. For example, I want to calculate the sums at each change in yname, and add subtotal to all numerical variables. There are 160 distinct ynames, so the resulting table should tell me the subtotal of each yname. I haven't sorted the data yet, but this is not a problem because I can sort the data in whatever way I want. Below is a snippet from my data:

             date     mcode mname            ycode    yname   yissue bsent breturn tsent treturn csales
417572 2010-07-28     45740 ENDPOINT A        5772    XMAG  20100800     7       0     7       0      0
417573 2010-07-31     45740 ENDPOINT A        5772    XMAG  20100800     0       0     0       0      1
417574 2010-08-04     45740 ENDPOINT A        5772    XMAG  20100800     0       0     0       0      1
417575 2010-08-14     45740 ENDPOINT A        5772    XMAG  20100800     0       0     0       0      1
417576 2010-08-26     45740 ENDPOINT A        5772    XMAG  20100800     0       4     0       0      0
417577 2010-07-28     45741 ENDPOINT L        5772    XMAG  20100800     2       0     2       0      0
417578 2010-08-04     45741 ENDPOINT L        5772    XMAG  20100800     2       0     2       0      0
417579 2010-08-26     45741 ENDPOINT L        5772    XMAG  20100800     0       4     0       0      0
417580 2010-07-28     46390 ENDPOINT R        5772    XMAG  20100800     3       0     3       0      1
417581 2010-07-29     46390 ENDPOINT R        5772    XMAG  20100800     0       0     0       0      2
417582 2010-08-01     46390 ENDPOINT R        5779    YMAG  20100800     3       0     3       0      0
417583 2010-08-11     46390 ENDPOINT R        5779    YMAG  20100800     0       0     0       0      1
417584 2010-08-20     46390 ENDPOINT R        5779    YMAG  20100800     0       0     0       0      1
417585 2010-08-24     46390 ENDPOINT R        5779    YMAG  20100800     2       0     2       0      1
417586 2010-08-26     46390 ENDPOINT R        5779    YMAG  20100800     0       2     0       2      0
417587 2010-07-28     46411 ENDPOINT D        5779    YMAG  20100800     6       0     6       0      0
417588 2010-08-08     46411 ENDPOINT D        5779    YMAG  20100800     0       0     0       0      1
417589 2010-08-11     46411 ENDPOINT D        5779    YMAG  20100800     0       0     0       0      1
417590 2010-08-26     46411 ENDPOINT D        5779    YMAG  20100800     0       4     0       4      0

What function should I use here? Maybe something like SQL group by?

标签: r subtotal
7条回答
forever°为你锁心
2楼-- · 2020-06-17 05:00

Google wasn't super helpful when I tried to find an answer to a similar question. I thought I would share my solution below using the library(janitor) package with split(), and purrr::map_df().

My use case was to run a script that would grab CC expenses from many different people to be reviewed by a person.


library(janitor)
library(purrr)
library(dplyr)

mtcars %>% 
  split(.[,"cyl"]) %>% ## splits each change in cyl into a list of dataframes 
  map_df(., janitor::adorn_totals)

   mpg cyl   disp   hp  drat     wt   qsec vs am gear carb
  22.8   4  108.0   93  3.85  2.320  18.61  1  1    4    1
  24.4   4  146.7   62  3.69  3.190  20.00  1  0    4    2
  22.8   4  140.8   95  3.92  3.150  22.90  1  0    4    2
  32.4   4   78.7   66  4.08  2.200  19.47  1  1    4    1
  30.4   4   75.7   52  4.93  1.615  18.52  1  1    4    2
  33.9   4   71.1   65  4.22  1.835  19.90  1  1    4    1
  21.5   4  120.1   97  3.70  2.465  20.01  1  0    3    1
  27.3   4   79.0   66  4.08  1.935  18.90  1  1    4    1
    26   4  120.3   91  4.43  2.140  16.70  0  1    5    2
  30.4   4   95.1  113  3.77  1.513  16.90  1  1    5    2
  21.4   4  121.0  109  4.11  2.780  18.60  1  1    4    2
 Total  44 1156.5  909 44.78 25.143 210.51 10  8   45   17
    21   6  160.0  110  3.90  2.620  16.46  0  1    4    4
    21   6  160.0  110  3.90  2.875  17.02  0  1    4    4
  21.4   6  258.0  110  3.08  3.215  19.44  1  0    3    1
  18.1   6  225.0  105  2.76  3.460  20.22  1  0    3    1
  19.2   6  167.6  123  3.92  3.440  18.30  1  0    4    4
  17.8   6  167.6  123  3.92  3.440  18.90  1  0    4    4
  19.7   6  145.0  175  3.62  2.770  15.50  0  1    5    6
 Total  42 1283.2  856 25.10 21.820 125.84  4  3   27   24
  18.7   8  360.0  175  3.15  3.440  17.02  0  0    3    2
  14.3   8  360.0  245  3.21  3.570  15.84  0  0    3    4
  16.4   8  275.8  180  3.07  4.070  17.40  0  0    3    3
  17.3   8  275.8  180  3.07  3.730  17.60  0  0    3    3
  15.2   8  275.8  180  3.07  3.780  18.00  0  0    3    3
  10.4   8  472.0  205  2.93  5.250  17.98  0  0    3    4
  10.4   8  460.0  215  3.00  5.424  17.82  0  0    3    4
  14.7   8  440.0  230  3.23  5.345  17.42  0  0    3    4
  15.5   8  318.0  150  2.76  3.520  16.87  0  0    3    2
  15.2   8  304.0  150  3.15  3.435  17.30  0  0    3    2
  13.3   8  350.0  245  3.73  3.840  15.41  0  0    3    4
  19.2   8  400.0  175  3.08  3.845  17.05  0  0    3    2
  15.8   8  351.0  264  4.22  3.170  14.50  0  1    5    4
    15   8  301.0  335  3.54  3.570  14.60  0  1    5    8
 Total 112 4943.4 2929 45.21 55.989 234.81  0  2   46   49


# if you're sending the output to be reviewed by a person, add a row! 

mtcars %>% 
  split(.[,"cyl"]) %>% 
  map_df(., ~janitor::adorn_totals(.x) %>% 
           dplyr::add_row()) %>% 
  write.csv(., "demo.csv")

查看更多
登录 后发表回答