I am going through Hadley Wickham's "R for Data Science" where he uses ~var
in ggplot calls.
I understand y ~ a + bx
, where ~
describes a formula/relationship between dependent and independent variables, but what does ~var
mean? More importantly, why can't you just put the variable itself? See code below:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~ class, nrow = 2)
or
demo <- tribble(
~cut, ~freq,
"Fair", 1610,
"Good", 4906,
"Very Good", 12082,
"Premium", 13791,
"Ideal", 21551
)
ggplot(data = demo) +
geom_bar(mapping = aes(x = cut, y = freq), stat = "identity")
It's just ggplot
making use of the formula
structure to let the user decide what variables to facet on. From ?facet_grid
:
For compatibility with the classic interface, rows can also be a formula with the rows (of the tabular display) on the LHS and the columns (of the tabular display) on the RHS; the dot in the formula is used to indicate there should be no faceting on this dimension (either row or column).
So facet_grid(. ~ var)
just means to facet the grid on the variable var
, with the facets spread over columns. It's the same as facet_grid(col = vars(var))
.
Despite looking like a formula
, it's not really being used as a formula: it's just a way to present multiple arguments to R in a way that the facet_grid
code can clearly and unambiguously interpret.
It is a syntax specific to facet_wrap
, where a formula can be given as the input for the variable relationships. From the documentation for the first argument, facets
:
A set of variables or expressions quoted by vars() and
defining faceting groups on the rows or columns dimension. The
variables can be named (the names are passed to labeller). For
compatibility with the classic interface, can also be a formula or
character vector. Use either a one sided formula, '~a b, or a
character vector,c("a", "b")'.
So I think you can now just give the variable names without the tilde, but you used to need to give a one-sided formula with the tilde.