Drawing on the discussion on conditional dplyr evaluation I would like conditionally execute a step in pipeline depending on whether the reference column exists in the passed data frame.
Example
The results generated by 1)
and 2)
should be identical.
Existing column
# 1)
mtcars %>%
filter(am == 1) %>%
filter(cyl == 4)
# 2)
mtcars %>%
filter(am == 1) %>%
{
if("cyl" %in% names(.)) filter(cyl == 4) else .
}
Unavailable column
# 1)
mtcars %>%
filter(am == 1)
# 2)
mtcars %>%
filter(am == 1) %>%
{
if("absent_column" %in% names(.)) filter(absent_column == 4) else .
}
Problem
For the available column the passed object does not correspond to the initial data frame. The original code returns the error message:
Error in
filter(cyl == 4)
: object'cyl'
not found
I have tried alternative syntax (with no luck):
>> mtcars %>%
... filter(am == 1) %>%
... {
... if("cyl" %in% names(.)) filter(.$cyl == 4) else .
... }
Show Traceback
Rerun with Debug
Error in UseMethod("filter_") :
no applicable method for 'filter_' applied to an object of class "logical"
Follow-up
I wanted to expand this question that would account for the evaluation on the right-hand side of the ==
in filter
call. For instance the syntax below attempts to filter on the first available value.
mtcars %>%
filter({
if ("does_not_ex" %in% names(.))
does_not_ex
else
NULL
} == {
if ("does_not_ex" %in% names(.))
unique(.[['does_not_ex']])
else
NULL
})
Expectedly, the call evaluates to an error message:
Error in
filter_impl(.data, quo)
: Result must have length 32, not 0
When applied to existing column:
mtcars %>%
filter({
if ("mpg" %in% names(.))
mpg
else
NULL
} == {
if ("mpg" %in% names(.))
unique(.[['mpg']])
else
NULL
})
It works with a warning message:
mpg cyl disp hp drat wt qsec vs am gear carb
1 21 6 160 110 3.9 2.62 16.46 0 1 4 4
Warning message: In
{
: longer object length is not a multiple of shorter object length
Follow-up question
Is there a neat way of expending the existing syntax in order to get conditional evaluation on the right-hand side of the filter
call, ideally staying within dplyr workflow?