frequency table with several variables in R

I am trying to replicate a table often used in official statistics but no success so far. Given a dataframe like this one:

d1 <- data.frame( StudentID = c("x1", "x10", "x2", 
                          "x3", "x4", "x5", "x6", "x7", "x8", "x9"),
             StudentGender = c('F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'M', 'M'),
             ExamenYear    = c('2007','2007','2007','2008','2008','2008','2008','2009','2009','2009'),
             Exam          = c('algebra', 'stats', 'bio', 'algebra', 'algebra', 'stats', 'stats', 'algebra', 'bio', 'bio'),
             participated  = c('no','yes','yes','yes','no','yes','yes','yes','yes','yes'),  
             passed      = c('no','yes','yes','yes','no','yes','yes','yes','no','yes'),
             stringsAsFactors = FALSE)

I would like to create a table showing PER YEAR , the number of all students (all) and those who are female, those who participated and those who passed. Please note "ofwhich" below refers to all students.

A table I have in mind would look like that:

cbind(All = table(d1$ExamenYear),
  participated      = table(d1$ExamenYear, d1$participated)[,2],
  ofwhichFemale     = table(d1$ExamenYear, d1$StudentGender)[,1],
  ofwhichpassed     = table(d1$ExamenYear, d1$passed)[,2])

I am sure there is a better way to this kind of thing in R.

Note: I have seen LaTex solutions, but I am not use this will work for me as I need to export the table in Excel .

Thanks in advance

标签： r aggregate frequency

4条回答

Emotional °昔

2楼-- · 2019-01-23 08:51

Using plyr:

require(plyr)
ddply(d1, .(ExamenYear), summarize,
      All=length(ExamenYear),
      participated=sum(participated=="yes"),
      ofwhichFemale=sum(StudentGender=="F"),
      ofWhichPassed=sum(passed=="yes"))

Which gives:

  ExamenYear All participated ofwhichFemale ofWhichPassed
1       2007   3            2             2             2
2       2008   4            3             2             3
3       2009   3            3             0             2

0人赞添加讨论(0) 举报

Root（大扎）

3楼-- · 2019-01-23 09:02

The plyr package is great for this sort of thing. First load the package

library(plyr)

Then we use the ddply function:

ddply(d1, "ExamenYear", summarise, 
      All = length(passed),##We can use any column for this statistics
      participated = sum(participated=="yes"),
      ofwhichFemale = sum(StudentGender=="F"),
      ofwhichpassed = sum(passed=="yes"))

Basically, ddply expects a dataframe as input and returns a data frame. We then split up the input data frame by ExamenYear. On each sub table we calculate a few summary statistics. Notice that in ddply, we don't have to use the $ notation when referring to columns.

0人赞添加讨论(0) 举报

啃猪蹄的小仙女

4楼-- · 2019-01-23 09:09

You may also want to take a look of the plyr's next iterator: dplyr

It uses a ggplot-like syntax and provide fast performance by writing key pieces in C++.

d1 %.% 
group_by(ExamenYear) %.%    
summarise(ALL=length(ExamenYear),
          participated=sum(participated=="yes"),
          ofwhichFemale=sum(StudentGender=="F"),
          ofWhichPassed=sum(passed=="yes"))

0人赞添加讨论(0) 举报

一夜七次

5楼-- · 2019-01-23 09:14

There could have been a couple of modifications (use with to reduce the number of df$ calls and use character indices to improve self-documentation) to your code that would have made it easier to read and a worthy competitor to the ddply solutions:

with( d1, cbind(All = table(ExamenYear),
  participated      = table(ExamenYear, participated)[,"yes"],
  ofwhichFemale     = table(ExamenYear, StudentGender)[,"F"],
  ofwhichpassed     = table(ExamenYear, passed)[,"yes"])
     )

     All participated ofwhichFemale ofwhichpassed
2007   3            2             2             2
2008   4            3             2             3
2009   3            3             0             2

I would expect this to be much faster than the ddply solution, although that will only be apparent if you are working on larger datasets.

0人赞添加讨论(0) 举报

frequency table with several variables in R

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间