-->

Function for Tidy chisq.test Output for Visualizin

2019-06-04 08:49发布

问题:

For data...

library(productplots) 
library(ggmosaic)

For code...

 library(tidyverse)
 library(broom)

I'm trying to create tidy chisq.test output so that I can easily filter or visualize p-values.

I'm using the "happy" dataset (which is included with either of the packages listed above)

For this example, if I wanted to condition the "happy" variable on all other variables,I would isolate the categorical variables (I'm not going to create factor groupings out of age, year, etc, for this example), and then run a simple function.

df<-happy%>%select(-year,-age,-wtssall)
lapply(df,function(x)chisq.test(happy$happy,x)

However, I would like a tidy output from the "broom" package so that I can create a dataframe of p-values to filter on or visualize.

I've tried various combinations similar to the code below with the hopes of further piping into "tidy" broom functions or into "filter" where I can narrow in on the significant p-values, or pipe into a ggplot bar chart of p-values or chi statistics.

df%>%summarise_if(is.factor,funs(chisq.test(.,df$happy)$p.value))

...but the output doesn't seem correct. If I run invidivual chisq.test separately against the variables, the answers are different.

So, is there a way to easily compare categorical variables, in this case "happy" against all the other columns, and return a tidy dataframe for further manipulation and analysis?

A Purrr solution using dplyr::mutate, tidyr::nest, and purrr::map would be great, but I have a feeling the nested list column method wouldn't work with chisq.test.

回答1:

You can do this all within the tidyverse workflow, using map in place of lapply. There's no need for nest unless you're going to be subsetting the data to compare the results in some fashion (e.g an age group)

df <- happy%>%
  select(-id, -year,-age,-wtssall) %>% 
  map(~chisq.test(.x, happy$happy)) %>% 
  tibble(names = names(.), data = .) %>% 
  mutate(stats = map(data, tidy))

unnest(df, stats)

# A tibble: 6 × 6
    names        data   statistic       p.value parameter                     method
    <chr>      <list>       <dbl>         <dbl>     <int>                     <fctr>
1   happy <S3: htest> 92606.00000  0.000000e+00         4 Pearson's Chi-squared test
2     sex <S3: htest>    11.46604  3.237288e-03         2 Pearson's Chi-squared test
3 marital <S3: htest>  2695.18474  0.000000e+00         8 Pearson's Chi-squared test
4  degree <S3: htest>   659.33013 4.057952e-137         8 Pearson's Chi-squared test
5 finrela <S3: htest>  2374.24165  0.000000e+00         8 Pearson's Chi-squared test
6  health <S3: htest>  2928.62829  0.000000e+00         6 Pearson's Chi-squared test