The magrittr package offers a set of operators which make your code more readable by:
The operators pipe their left-hand side values forward into expressions that appear on the right-hand side, i.e. one can replace f(x)
with x %>% f()
, where %>%
is the (main) pipe-operator. When coupling several function calls with the pipe-operator, the benefit will become more apparent. Consider this pseudo example:
the_data <-
read.csv('/path/to/data/file.csv') %>%
subset(variable_a > x) %>%
transform(variable_c = variable_a/variable_b) %>%
head(100)
Four operations are performed to arrive at the desired data set, and they are written in a natural order: the same as the order of execution. Also, no temporary variables are needed. If yet another operation is required, it is straight-forward to add to the sequence of operations wherever it may be needed.
If you are new to magrittr, the best place to start is the pipes chapter in R for data science.
x %>% f
is equivalent to f(x)
x %>% f(y)
is equivalent to f(x, y)
x %>% f %>% g %>% h
is equivalent to h(g(f(x)))
Here, “equivalent” is not technically exact: evaluation is non-standard, and the left-hand side is evaluated before passed on to the right-hand side expression. However, in most cases this has no practical implication.
x %>% f(y, .)
is equivalent to f(y, x)
x %>% f(y, z = .)
is equivalent to f(y, z = x)
It is straight-forward to use the placeholder several times in a right-hand side expression. However, when the placeholder only appears in a nested expressions magrittr will still apply the first-argument rule. The reason is that in most cases this results more clean code.
x %>% f(y = nrow(.), z = ncol(.))
is equivalent to f(x, y = nrow(x), z = ncol(x))
The behavior can be overruled by enclosing the right-hand side in braces:
x %>% {f(y = nrow(.), z = ncol(.))}
is equivalent to f(y = nrow(x), z = ncol(x))
Any pipeline starting with the .
will return a function which can later be used to apply the pipeline to values. Building functions in magrittr is therefore similar to building other values.
Many functions accept a data argument, e.g. lm
and aggregate
, which is very useful in a pipeline where data is first processed and then passed into such a function. There are also functions that do not have a data argument, for which it is useful to expose the variables in the data. This is done with the %$%
operator: