I am using the tidyverse to filter out a dataframe and would like a print at each step of the dimensions (or nrows) of the intermediate objects.
I thought I could simply use a tee pipe operator from magrittr but it doesn't work.
I think I understand the concept behind the tee pipe but can't figure out what is wrong. I searched extensively but didn't find much resources about the tee pipe.
I built a simple example with the mtcars dataset. Printing the intermediate objects works but not if I replace with dim() or nrow().
library(tidyverse)
library(magrittr)
mtcars %>%
filter(cyl > 4) %T>% dim() %>%
filter(am == 0) %T>% dim() %>%
filter(disp >= 200) %>% dim()
I can of course write that in R base but would like to stick to the tidyverse spirit. I probably underlooked something about tee pipe concept and any comments/solutions will be greatly appreciated.
EDIT:
Following @hrbrmstr and @akrun nice and quick answers, I tried again to stick to tee pipe operator without writing a function. I don't know why I didn't find out the answer earlier myself but here is the syntax I was looking for:
mtcars %>%
filter(cyl > 4) %T>% {print(dim(.))} %>%
filter(am == 0) %T>% {print(dim(.))} %>%
filter(disp >= 200) %>% {print(dim(.))}
Despite the need of a function, @hrbrmstr solution is indeed easier to "clean up".
@akrun's idea works, but it's not idiomatic tidyverse. Other functions in the tidyverse
, like print()
and glimpse()
return the data parameter invisibly so they can be piped without resorting to {}
. Those {}
make it difficult to clean up pipes after your done exploring what's going on.
Try:
library(tidyverse)
tidydim <- function(x) {
print(dim(x))
invisible(x)
}
mtcars %>%
filter(cyl > 4) %>%
tidydim() %>%
filter(., am == 0) %>%
tidydim() %>%
filter(., disp >= 200) %>%
tidydim()
That way your "cleanup" (i.e. not producing interim console output) canbe to quickly/easily remove the tidydim()
lines or remove the print(…)
from the function.
The pipe %T>%
from library magrittr
was created just for this type of cases :
library(magrittr)
library(dplyr)
mtcars %>%
filter(cyl > 4) %T>% {print(dim(.))} %>%
filter(am == 0) %T>% {print(dim(.))} %>%
filter(disp >= 200) %T>% {print(dim(.))}
Very easy to read and edit out in Rstudio
using alt + selection
if you ident as I do.
You can also use @hrbrmstr 's function here if you don't like brackets, except you won't need the last line.
Revisiting it months later here's an idea generalizing @hrbmst's solution so you can print pretty much what you want and return the input to carry on with the pipe.
library(tidyverse)
pprint <- function(.data,.fun,...){
.fun <- purrr::as_mapper(.fun)
print(.fun(.data,...))
invisible(.data)
}
iris %>%
pprint(~"hello") %>%
head(2) %>%
select(-Species) %>%
pprint(rowSums,na.rm=TRUE) %>%
pprint(~rename_all(.[1:2],toupper)) %>%
pprint(dim)
# [1] "hello"
# 1 2
# 10.2 9.5
# SEPAL.LENGTH SEPAL.WIDTH
# 1 5.1 3.5
# 2 4.9 3.0
# [1] 2 4
We could use the print
within {}
mtcars %>%
filter(cyl > 4) %>%
{print(dim(.))
filter(., am == 0) } %>%
{print(dim(.))
filter(., disp >= 200)} %>%
{print(dim(.))
.}
#[1] 21 11
#[1] 16 11
#[1] 14 11
# mpg cyl disp hp drat wt qsec vs am gear carb
#1 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#2 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#3 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#4 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#5 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
#6 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
#7 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#8 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#9 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
#10 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
#11 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
#12 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
#13 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
#14 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
With package mmpipe
you can define a custom pipe for the job and have very compact code
library(tidyverse)
library(magrittr)
# devtools::install_github("moodymudskipper/mmpipe")
library(mmpipe)
add_pipe(`%dim>%`,substitute({. <- b; print(dim(.)); cat("\n"); .}, list(b = body)))
mtcars %dim>%
filter(cyl > 4) %dim>%
filter(am == 0) %dim>%
filter(disp >= 200)
# [1] 21 11
#
# [1] 16 11
#
# [1] 14 11
#
# mpg cyl disp hp drat wt qsec vs am gear carb
# 1 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
# 2 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
# ...