Print data frame dimensions at each step of filter

2019-02-18 16:51发布

I am using the tidyverse to filter out a dataframe and would like a print at each step of the dimensions (or nrows) of the intermediate objects. I thought I could simply use a tee pipe operator from magrittr but it doesn't work. I think I understand the concept behind the tee pipe but can't figure out what is wrong. I searched extensively but didn't find much resources about the tee pipe.

I built a simple example with the mtcars dataset. Printing the intermediate objects works but not if I replace with dim() or nrow().

library(tidyverse)
library(magrittr)

mtcars %>% 
    filter(cyl > 4) %T>% dim() %>%
    filter(am == 0) %T>% dim() %>%
    filter(disp >= 200) %>% dim()

I can of course write that in R base but would like to stick to the tidyverse spirit. I probably underlooked something about tee pipe concept and any comments/solutions will be greatly appreciated.

EDIT: Following @hrbrmstr and @akrun nice and quick answers, I tried again to stick to tee pipe operator without writing a function. I don't know why I didn't find out the answer earlier myself but here is the syntax I was looking for:

mtcars %>% filter(cyl > 4) %T>% {print(dim(.))} %>% filter(am == 0) %T>% {print(dim(.))} %>% filter(disp >= 200) %>% {print(dim(.))}

Despite the need of a function, @hrbrmstr solution is indeed easier to "clean up".

4条回答
叼着烟拽天下
2楼-- · 2019-02-18 17:24

The pipe %T>% from library magrittr was created just for this type of cases :

library(magrittr)
library(dplyr)
mtcars %>%
  filter(cyl > 4)     %T>% {print(dim(.))} %>%
  filter(am == 0)     %T>% {print(dim(.))} %>%
  filter(disp >= 200) %T>% {print(dim(.))}

Very easy to read and edit out in Rstudio using alt + selection if you ident as I do.

You can also use @hrbrmstr 's function here if you don't like brackets, except you won't need the last line.


Revisiting it months later here's an idea generalizing @hrbmst's solution so you can print pretty much what you want and return the input to carry on with the pipe.

library(tidyverse)
pprint <- function(.data,.fun,...){
  .fun <- purrr::as_mapper(.fun)
  print(.fun(.data,...))
  invisible(.data)
}

iris %>%
  pprint(~"hello")           %>%
  head(2)                    %>%
  select(-Species)           %>%
  pprint(rowSums,na.rm=TRUE) %>%
  pprint(~rename_all(.[1:2],toupper)) %>%
  pprint(dim)

# [1] "hello"
#    1    2 
# 10.2  9.5 
#   SEPAL.LENGTH SEPAL.WIDTH
# 1          5.1         3.5
# 2          4.9         3.0
# [1] 2 4
查看更多
Ridiculous、
3楼-- · 2019-02-18 17:27

With package mmpipe you can define a custom pipe for the job and have very compact code

library(tidyverse)
library(magrittr)
# devtools::install_github("moodymudskipper/mmpipe")
library(mmpipe)

add_pipe(`%dim>%`,substitute({. <- b; print(dim(.)); cat("\n"); .}, list(b = body)))

mtcars %dim>% 
  filter(cyl > 4) %dim>%
  filter(am == 0) %dim>%
  filter(disp >= 200)
# [1] 21 11
# 
# [1] 16 11
# 
# [1] 14 11
#
#     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
# 1  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
# 2  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
# ... 
查看更多
forever°为你锁心
4楼-- · 2019-02-18 17:35

@akrun's idea works, but it's not idiomatic tidyverse. Other functions in the tidyverse, like print() and glimpse() return the data parameter invisibly so they can be piped without resorting to {}. Those {} make it difficult to clean up pipes after your done exploring what's going on.

Try:

library(tidyverse)

tidydim <- function(x) {
  print(dim(x))
  invisible(x)
}

mtcars %>%
  filter(cyl > 4) %>%
  tidydim() %>% 
  filter(., am == 0) %>%
  tidydim() %>% 
  filter(., disp >= 200) %>%
  tidydim()

That way your "cleanup" (i.e. not producing interim console output) canbe to quickly/easily remove the tidydim() lines or remove the print(…) from the function.

查看更多
闹够了就滚
5楼-- · 2019-02-18 17:40

We could use the print within {}

mtcars %>%
   filter(cyl > 4) %>%
   {print(dim(.))
    filter(., am == 0) } %>%
   {print(dim(.))
    filter(., disp >= 200)} %>%
   {print(dim(.))
   .}
#[1] 21 11
#[1] 16 11
#[1] 14 11
#    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#1  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
#2  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
#3  18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
#4  14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
#5  16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
#6  17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
#7  15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
#8  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
#9  10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
#10 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
#11 15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
#12 15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
#13 13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
#14 19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
查看更多
登录 后发表回答