Save output between pipes in dplyr [duplicate]

2019-02-19 14:52发布

问题:

This question already has an answer here:

  • Assign intermediate output to temp variable as part of dplyr pipeline 5 answers

I am writing a function with several pipes. I would like to save some of the steps as .tbl or data frame before the last pipe. For instance: a %>% b %>% c, I would like to save the step 'c', but also want the step 'b'.

I know that one option is to do two pipes, but I believe that must have a better way.

cars %>% mutate(kmh = dist/speed) %>% summary()

回答1:

Thanks for the help. I found a better solution using braces{} and ->>. See below

   c = cars %>% mutate(var1 = dist*speed)%>%
   {. ->> b } %>%   #here is save
   summary()
   c
   head(b)


回答2:

Not sure why one will need it. But as @Frank suggested one option is to use %T>% operator (tee operator) from magrittr package along with assign function to store intermediate values.

In the below code the SummaryVal will have summary information of cars and MyValue will hold the intermediate value after mutate.

library(tidyverse)
library(magrittr)

SummaryVal <- cars %>% mutate(kmh = dist/speed) %T>% 
              assign("MyValue",.,envir = .GlobalEnv) %>% 
              summary()

head(MyValue)
#   speed dist       kmh
# 1     4    2 0.5000000
# 2     4   10 2.5000000
# 3     7    4 0.5714286
# 4     7   22 3.1428571
# 5     8   16 2.0000000
# 6     9   10 1.1111111

SummaryVal
#    speed           dist             kmh       
# Min.   : 4.0   Min.   :  2.00   Min.   :0.500  
# 1st Qu.:12.0   1st Qu.: 26.00   1st Qu.:1.921  
# Median :15.0   Median : 36.00   Median :2.523  
# Mean   :15.4   Mean   : 42.98   Mean   :2.632  
# 3rd Qu.:19.0   3rd Qu.: 56.00   3rd Qu.:3.186  
# Max.   :25.0   Max.   :120.00   Max.   :5.714 

UPDATED: As @Renu correctly pointed out even %>% will work as below:

SummaryVal <- cars %>% mutate(kmh = dist/speed) %>% 
              assign("MyValue",.,envir = .GlobalEnv) %>% 
              summary()


回答3:

Lists and a function are the way to go. Makes debugging easy and is still readable. Here is a small example. You will need to include some error handling in the function to make sure the data you give to it is what you expect etc. The function will return a list with the results. Just in case you want to have separate data.frames instead of a big list, the last line of code pulls out all the data.frame from the list as separate data.frames.

library(dplyr)

# create a function
my_summaries <- function(x){
  # error handling goes here
  my_mutate <- x %>% mutate(kmh = dist/speed)
  my_summary <- my_mutate %>% summary()
  list(mutate = my_mutate, summary = my_summary)
}

my_data <- my_summaries(cars)

str(my_data)
List of 2
 $ mutate :'data.frame':    50 obs. of  3 variables:
  ..$ speed: num [1:50] 4 4 7 7 8 9 10 10 10 11 ...
  ..$ dist : num [1:50] 2 10 4 22 16 10 18 26 34 17 ...
  ..$ kmh  : num [1:50] 0.5 2.5 0.571 3.143 2 ...
 $ summary: 'table' chr [1:6, 1:3] "Min.   : 4.0  " "1st Qu.:12.0  " "Median :15.0  " "Mean   :15.4  " ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:6] "" "" "" "" ...
  .. ..$ : chr [1:3] "    speed" "     dist" "     kmh"


# Unlist list of data.frames
list2env(my_data ,.GlobalEnv)


标签: r dplyr pipe