Stata: Summary stats with table. Order by N

2019-09-18 19:35发布

问题:

How can I order the following table in descending order of frequency?

sysuse auto.dta, clear
replace make = substr(make,1, strpos(make," ")-2) 
table make, c(N price mean price median price sd price min price max price) format(%9.2f) center

The first observation should be buic or old with N=7. Is there a way to order by frequency?

The code above also gives the error that there's too many stats(). Is there an alternative procedure that allows more columns?

回答1:

(In what follows, I followed your rather bizarre way of aggregating make. word(make, 1) strikes me as more natural; and in any case your method misses "Subaru".)

The documentation for table spells out that no more than 5 statistics may be specified. See e.g. http://www.stata.com/help.cgi?table Hence this problem should not be surprising.

However, tabstat can show more summary statistics. The sort order you want can be obtained by creating a categorical variable based on frequency and the make2 variable: both must be used because there are ties on frequency. That can be assigned value labels using the user-written labmask command (search labmask for download locations). I don't know a way of getting a different format for the counts.

sysuse auto.dta, clear
gen make2 = substr(make,1, strpos(make," ")-2) 
replace make2 = make if missing(make2) 
bysort make2 : gen freq = -_N 
egen group = group(freq make2) 
labmask group, values(make2) 
tabstat price, s(N mean median sd min max) format(%9.2f) by(group) 

 Summary for variables: price
 by categories of: group (group(freq make2))

  group |         N      mean       p50        sd       min       max
 -------+------------------------------------------------------------
   Buic |      7.00   6075.29   5189.00   2257.92   4082.00  10372.00
    Old |      7.00   6050.86   4890.00   2486.49   4181.00  10371.00
   Chev |      6.00   4372.33   4229.50    911.30   3299.00   5705.00
   Merc |      6.00   4913.83   4947.50   1239.38   3291.00   6303.00
   Pont |      6.00   4878.83   4828.50    582.49   4172.00   5798.00
   Plym |      5.00   4820.00   4482.00    955.69   4060.00   6486.00
  Datsu |      4.00   6006.50   5654.00   1573.12   4589.00   8129.00
   Dodg |      4.00   5055.50   4948.00   1236.39   3984.00   6342.00
      V |      4.00   6021.00   6123.50   1166.44   4697.00   7140.00
     AM |      3.00   4215.67   4099.00    485.63   3799.00   4749.00
    Cad |      3.00  13930.33  14500.00   2313.71  11385.00  15906.00
   Linc |      3.00  12852.33  13466.00   1175.50  11497.00  13594.00
  Toyot |      3.00   5122.00   5719.00   1193.32   3748.00   5899.00
    Aud |      2.00   7992.50   7992.50   2400.63   6295.00   9690.00
    For |      2.00   4288.00   4288.00    142.84   4187.00   4389.00
   Hond |      2.00   5149.00   5149.00    919.24   4499.00   5799.00
     BM |      1.00   9735.00   9735.00         .   9735.00   9735.00
    Fia |      1.00   4296.00   4296.00         .   4296.00   4296.00
   Mazd |      1.00   3995.00   3995.00         .   3995.00   3995.00
 Peugeo |      1.00  12990.00  12990.00         .  12990.00  12990.00
 Renaul |      1.00   3895.00   3895.00         .   3895.00   3895.00
 Subaru |      1.00   3798.00   3798.00         .   3798.00   3798.00
   Volv |      1.00  11995.00  11995.00         .  11995.00  11995.00
 -------+------------------------------------------------------------
  Total |     74.00   6165.26   5006.50   2949.50   3291.00  15906.00
 --------------------------------------------------------------------

In some ways a more direct solution is just to collapse the data and then assign appropriate sort order and formats.

 sysuse auto.dta, clear
 gen make2 = substr(make,1, strpos(make," ")-2) 
 replace make2 = make if missing(make2) 
 collapse (count)n=price (mean)mean=price (p50)median=price (sd)sd=price (min)min=price (max)max=price, by(make2)  
 gsort -n 
 format mean-max %9.2f 
 format n %9.0f 
 list make2 n mean median sd min max, sep(0) noobs 

 +------------------------------------------------------------------+
 |  make2   n       mean     median        sd        min        max |
 |------------------------------------------------------------------|
 |   Buic   7    6075.29    5189.00   2257.92    4082.00   10372.00 |
 |    Old   7    6050.86    4890.00   2486.49    4181.00   10371.00 |
 |   Chev   6    4372.33    4229.50    911.30    3299.00    5705.00 |
 |   Pont   6    4878.83    4828.50    582.49    4172.00    5798.00 |
 |   Merc   6    4913.83    4947.50   1239.38    3291.00    6303.00 |
 |   Plym   5    4820.00    4482.00    955.69    4060.00    6486.00 |
 |      V   4    6021.00    6123.50   1166.44    4697.00    7140.00 |
 |  Datsu   4    6006.50    5654.00   1573.12    4589.00    8129.00 |
 |   Dodg   4    5055.50    4948.00   1236.39    3984.00    6342.00 |
 |    Cad   3   13930.33   14500.00   2313.71   11385.00   15906.00 |
 |  Toyot   3    5122.00    5719.00   1193.32    3748.00    5899.00 |
 |   Linc   3   12852.33   13466.00   1175.50   11497.00   13594.00 |
 |     AM   3    4215.67    4099.00    485.63    3799.00    4749.00 |
 |    Aud   2    7992.50    7992.50   2400.63    6295.00    9690.00 |
 |    For   2    4288.00    4288.00    142.84    4187.00    4389.00 |
 |   Hond   2    5149.00    5149.00    919.24    4499.00    5799.00 |
 |   Volv   1   11995.00   11995.00         .   11995.00   11995.00 |
 |    Fia   1    4296.00    4296.00         .    4296.00    4296.00 |
 |   Mazd   1    3995.00    3995.00         .    3995.00    3995.00 |
 | Peugeo   1   12990.00   12990.00         .   12990.00   12990.00 |
 | Subaru   1    3798.00    3798.00         .    3798.00    3798.00 |
 | Renaul   1    3895.00    3895.00         .    3895.00    3895.00 |
 |     BM   1    9735.00    9735.00         .    9735.00    9735.00 |
 +------------------------------------------------------------------+


标签: stata