A Spoonful of Metaprogramming

6/22/2022

(is good for what ails you)

Note:

This post was written as a work presentation around ~2019 or so. I like the ideas here still, so it stays on the blog (Neptune, 2025)

What?

A language that supports metaprogramming is a language that supports programming itself. Here I focus on two aspects:

  • Code is data, data is code
  • The language itself is programmable

Why?

Fun mostly. And to expand the number of operations we can perform without thinking about them.

Manipulating Functions

functions

There are 3 parts of a function in R.

f <- function(x) x + 25

# formals
formals(f)

# body
body(f)

# environment
environment(f)

Formals

Formals are a list, so we can treat them as such.

# define multi-parameter function
(g <- function(x, y, z) (x + y) / z)

g(1, 2, 3)

# look at formals
formals(g)

# mess with it
rev_forms <- function(fun) {
  formals(fun) <- rev(formals(fun))
  fun
}

(g <- rev_forms(g))
g
g(1, 2, 3)

# add some variables
formals(g) <- c(formals(g),
                list(a = missing_arg(),
                     b = missing_arg(),
                     c = missing_arg()))

g

# add global scope variables
w <- runif(1)
formals(g)$c <- w
g

Body

The function body is an expression.

# look at body
body(g)

# see how the data structure is put together
as.character(body(g))
body(g)[[1]]; body(g)[[2]]; body(g)[[3]]

# modify it
body(g)[[3]] <- quote(z + a + b + c)
body(g)
g

# evaluate our new expression
g(1, 2, 3)

# let c be evaluated by w
g(z = 1, y = 2, x = 3, a = 4, b = 5)

# let c = 6
g(1, 2, 3, 4, 5, 6)

# check that it matches
(3 + 2) / (1 + 4 + 5 + 6)

A function can even find its own definition.

f <- function(x = 5) {
  y <- x + 10
  # return definition of itself
  sys.function()
}

f()
body(f)
body(f)[[1]]; body(f)[[2]]; body(f)[[3]]

# what if we set our body of f to our body of f?
body(f)[[2]] <- body(f)
body(f)
f()

f <- function(x = 5) {
  if (x == 0) sys.function()
  else {
    body(f)[[3]] <- body(f)
    f(x - 1)
  }
}

f(10)

Environments

Environments are the scope in which the function is evaluated.

# our current environment
environment()

# scope inside of the function
h <- function() environment()
h()

# parent environment
parent.env(h())

# environments act like key-value stores
new_env <- new.env()
new_env$x <- 1
new_env$x

# loop through and assign
Map(function(abbrev, i) { new_env[[abbrev]] <- i },
    state.abb, 1:50)

# access values by name
new_env$MA
new_env$RI

# list names
names(new_env)

Manipulating Expressions

expressions

We can capture code without evaluating it using expr().

library(rlang)

x <- 2
y <- 0

# quote
(z <- expr(y <- x * x * x * x * x))
typeof(z)

# unquote
eval(z)
y

# selectively quote subexpressions
(z <- bquote(y <- .(x) + 256))
y
eval(z)
y

Quasiquotation = quotation + unquotation. It is everywhere in R.

library(dplyr)

# write this
mtcars %>%
  filter(qsec > 20)

# instead of this
mtcars %>% filter(.[["qsec"]] > 20)

# or this
mtcars %>% filter(mtcars$qsec > 20)

# especially since qsec exists as a name
qsec

Under the hood we delay evaluation until we have the proper environment. In this case, the awaited environment is the dataset mtcars. This is tidy evaluation: quasiquotation + quosures + data masks.

TidyEval

tidyeval

Tidyeval is practical: unquoted column references, scoped evaluation, and flexible data-masked functions.

library(dplyr)
library(rlang)
library(ggplot2)

# if you are only doing 1 thing, use {{ }}
tally_it <- function(.data, column) {
  .data %>% group_by({{ column }}) %>% tally(sort = TRUE)
}
mtcars %>% tally_it(mpg)

# multiple args with ...
tally_it <- function(.data, ...) {
  args <- enexprs(...)
  .data %>% group_by(!!!args) %>% tally(sort = TRUE)
}
mtcars %>% tally_it(mpg, hp, disp, cyl)

# coerce strings to exprs beforehand
tally_it <- function(.data, ...) {
  args <- enexprs(...)
  if (is.character(args[[1]])) args <- lapply(args, parse_expr)
  .data %>% group_by(!!!args) %>% tally(sort = TRUE)
}
mtcars %>% tally_it("mpg", "hp", "disp")

A helper to capture original argument names for plot labels, then a simple plotting wrapper:

get_args <- function() {
  as.list(match.call(def = sys.function(-1), call = sys.call(-1)))[-1]
}

plot_it <- function(.data, x, y) {
  xlab <- as.character(get_args()$x)
  ylab <- as.character(get_args()$y)
  .data %>%
    ggplot(aes(x = {{ x }}, y = {{ y }})) +
    geom_point() +
    ggtitle(paste0(ylab, " ~ ", xlab))
}

mtcars %>% plot_it(mpg, disp)
mtcars %>% plot_it(mpg, hp)

Substitutions

substitutions

We can coerce expressions to strings, manipulate them, and parse back.

library(rlang)
library(glue)

new_expression <-
  expr(y <- x + x + x + x + z) %>%
  deparse() %>%
  str_glue(" + z + z + z") %>%
  parse_expr()

x <- 2; y <- 0; z <- 1
eval(new_expression)
y

Make code as data, then unquote it back into code.

library(stringr)
library(dplyr)
library(tibble)
library(purrr)

mtcars %<>% as_tibble(rownames = "car_names")

unique_names <- mtcars %>%
  pull(car_names) %>%
  str_extract("^[A-Za-z]+") %>%
  unique()

conditional_statements <-
  set_names(
    map(unique_names,
        ~ parse_expr(paste0("ifelse(str_detect(car_names, "", .x, ""), TRUE, FALSE)"))),
    unique_names
  )

mtcars %>%
  mutate(!!!conditional_statements) %>%
  glimpse()

Pipes are just syntax sugar over nested calls.

mtcars %>%
  select(mpg, cyl, disp, hp) %>%
  filter(cyl == 6) %>%
  summarize(mean(mpg))

# equivalent
summarize(
  filter(
    select(mtcars, mpg, cyl, disp, hp), cyl == 6
  ),
  mean(mpg)
)

Define your own infix operators.

`%r%` <- function(expr, num) replicate(num, expr)

rexp(1) %r% 3
rexp(3) %r% 3
rexp(3) %r% 3 %r% 4

An example: replicate Clojure threading macros

In Clojure:

(+ 1 2 3 4 5)

(-> (load-data "xyz.csv")
    ($ "those_columns")
    (filter {:cyl {:eq 6}})
    (mean :mpg))

A small R helper that emulates a threaded pipeline by stitching a string of %>% calls and evaluating it:

library(rlang)
library(glue)
library(purrr)
library(dplyr)

p_ <- function(x, ...) {
  enexprs(x, ...) %>%
    map_chr(~ paste0(deparse(.x), " %>% ")) %>%
    paste(collapse = "") %>%
    str_glue("identity") %>%
    parse(text = .) %>%
    eval()
}

plus_n <- function(n) function(x) x + n
plus_n(3)(2)

x <- 1
p_(x, plus_n(3)(), plus_n(2)(), plus_n(3)())

p_(mtcars,
   select(mpg, cyl, disp, hp),
   filter(cyl == 6),
   summarize(mean(mpg)))

The big idea: we are not limited to the language designer. We can become the language designer.

Generating Code with R

generate

It does not need to be R code that we generate. dbplyr translates dplyr pipelines to SQL, for example.

library(dbplyr)
library(dplyr)

# generate SQL
mtcars %>%
  lazy_frame(con = simulate_mssql()) %>%
  select(mpg, cyl, disp, hp) %>%
  filter(cyl == 6) %>%
  summarize(mean(mpg)) %>%
  show_query()

# combine with metaprogramming and SQL primitives
library(stringr)
library(rlang)
library(purrr)
library(tibble)

unique_names <- mtcars %>%
  rownames_to_column("car_names") %>%
  pull(car_names) %>%
  str_extract("^[A-Za-z]+") %>%
  unique()

conditional_statements <-
  set_names(
    map(unique_names,
        ~ parse_expr(paste0("ifelse(car_names %like% "", .x, "", TRUE, FALSE)"))),
    unique_names
  )

mtcars %>%
  lazy_frame(con = simulate_mssql()) %>%
  mutate(!!!conditional_statements) %>%
  show_query()

Shiny generates HTML, CSS, and JS.

library(shiny)

(ui <- fluidPage(
  titlePanel("title panel"),
  sidebarLayout(
    sidebarPanel("sidebar panel"),
    mainPanel("main panel")
  )
))

div(h1(title()))
sidebarPanel("Panel")
mainPanel("Main Panel")

Other things to check out

extra

  • tidyeval
  • rlang
  • how calls work
  • how formulas work
  • how symbols work
  • how interpreters work
  • how compilers work

Resources to look into

  • Advanced R
  • Thomas Mailund's books on R
  • Lisp family programming languages
  • The Documentation