Object-Oriented Programming in R

Introduction

You already use object-oriented programming (OOP) patterns constantly in R, even if you have not thought about it explicitly.

Let’s look at the plot() and summary() functions.

x_norm <- rnorm(5000)
plot(x_norm)

summary(mtcars)
plot(mtcars)
plot(mtcars$mpg, mtcars$hp)

mtcars_model <- lm(mpg ~ wt, data = mtcars)
summary(mtcars_model)
plot(mtcars_model)

The same function - summary() - does something completely different depending on what you hand it. A numeric vector gets a five-number summary. A data frame gets column summaries. A linear model fit gets coefficients, standard errors, and p-values. The function name stays the same; the behaviour adapts to the data.

This is one of the most powerful ideas in OOP: you write code that calls print() or plot() or summary(), and the object itself determines what happens.

We are going to:

  1. Create a new S3 class cpue_result to wrap CPUE calculations with metadata
  2. Write print, summary, and plot methods for it
  3. Convert cpue() and biomass_index() into generics that work with both numeric vectors and data frames

OOP Systems in R

R has several OOP systems. They differ in formality and where methods live.

System Dispatch Methods live on… Formality Common in
S3 Single (first argument) The generic function Informal Base R, tidyverse, most packages
S4 Multiple arguments The generic function Formal (classes, validity) Bioconductor, methods-heavy packages
S7 Multiple arguments The generic function Formal (classes, properties, validation) Some new packages (successor to S3 and S4)
R6 Single The object itself Moderate (via R6 package) Mutable state, encapsulation

For package development, S3 covers the vast majority of use cases. S4 is worth learning if you work in Bioconductor or need multiple dispatch. S7 is designed as the eventual successor to both S3 and S4 - it combines the simplicity of S3 with more of the rigor of S4 (formal class definitions, property validation, multiple dispatch). If you are starting a new package and want more structure than S3 provides, S7 is worth considering, though S3 remains the dominant system in practice.

We will focus on S3 today, with a brief look at R6 at the end.

S3 in Depth

S3 is the most widely used OOP system in R. It is simple, flexible, and what base R itself uses.

How S3 works

An S3 object is just a regular R object (list, vector, data frame, etc.) with a class attribute. When you call a generic function like print(x), R looks at class(x) and dispatches to the appropriate method.

class(x_norm)
class(mtcars)
class(mtcars_model)

Seeing the dispatch chain

# What methods exist for print?
methods(generic.function = "print")

# What methods exist for a class?
methods(class = "data.frame")

The dispatch mechanism

When you call summary(x) where x has class "foo":

  1. R looks for summary.foo
  2. If not found, looks for summary.default
  3. If neither exists, throws an error

This is called single dispatch - the method is chosen based on the class of the first argument only.

Creating an S3 Class

Our cpue() function returns a plain numeric vector:

result <- cpue(c(100, 200, 300), c(10, 20, 15))
class(result)

That class is just "numeric". When we print(result), R dispatches to print.numeric and shows raw numbers. What if we want CPUE output to identify itself - to carry context about the calculation and display it in a useful way?

The simplest way: class() <-

The most direct way to create an S3 class is to set the class attribute on an existing object. Let’s modify cpue() to tag its output. Add two lines at the end of the function body in R/cpue.R:

cpue <- function(
  catch,
  effort,
  gear_factor = 1,
  method = c("ratio", "log"),
  verbose = getOption("fishr.verbose", FALSE)
) {
  method <- match.arg(method)

  validate_numeric_inputs(catch = catch, effort = effort)

  if (verbose) {
    message("Processing ", length(catch), " records using ", method, " method")
  }

  raw_cpue <- switch(
    method,
    ratio = catch / effort,
    log = log(catch / effort)
  )

  result <- raw_cpue * gear_factor
  class(result) <- "cpue_result"
  result
}
load_all()

result <- cpue(c(100, 200, 300), c(10, 20, 15))
result
class(result)

Now R knows this is a cpue_result. We can write a print method for it.

Writing Methods

Adding metadata with attributes

A bare numeric vector with a class label has no context. What gear factor was used? What method was used? R objects can carry arbitrary attributes - named metadata attached to the object. Let’s update the function body again:

cpue <- function(
  catch,
  effort,
  gear_factor = 1,
  method = c("ratio", "log"),
  verbose = getOption("fishr.verbose", FALSE)
) {
  method <- match.arg(method)

  validate_numeric_inputs(catch = catch, effort = effort)

  if (verbose) {
    message("Processing ", length(catch), " records using ", method, " method")
  }

  raw_cpue <- switch(
    method,
    ratio = catch / effort,
    log = log(catch / effort)
  )

  result <- raw_cpue * gear_factor
  attr(result, "gear_factor") <- gear_factor
  attr(result, "n_records") <- length(catch)
  attr(result, "method") <- method
  class(result) <- "cpue_result"
  result
}
load_all()

result <- cpue(c(100, 200, 300), c(10, 20, 15))
attr(result, "gear_factor")
attr(result, "n_records")
attr(result, "method")

As a class accumulates more fields, setting attributes one by one gets unwieldy.

Create a constructor function

For anything beyond a very basic class, the standard practice is to create a constructor function that builds the object in one place. A common naming convention, recommended in Advanced R, is new_<classname>.

#' @noRd
new_cpue_result <- function(values, method, gear_factor, n_records) {
  structure(
    values, # The data
    method = method, # Attributes specifying metadata
    gear_factor = gear_factor,
    n_records = n_records,
    class = "cpue_result" # class is a special attribute
  )
}

A few things changed:

  • structure() sets multiple attributes at once - including class - instead of separate attr() and class() <- calls
  • All object creation is centralised
#' Summarize a CPUE survey
#'
#' Calculates CPUE from catch and effort data and returns a structured
#' `cpue_result` object with metadata about the calculation.
#'
#' @param catch Numeric vector of catch values.
#' @param effort Numeric vector of effort values.
#' @param gear_factor Numeric gear correction factor (default 1).
#' @param method Calculation method: "ratio" or "log".
#'
#' @return A `cpue_result` object.
#' @export
#'
#' @examples
#' cpue(catch = c(100, 200, 300), effort = c(10, 20, 30))
cpue <- function(
  catch,
  effort,
  gear_factor = 1,
  method = c("ratio", "log"),
  verbose = getOption("fishr.verbose", FALSE)
) {
  # ...cpue body here

  new_cpue_result(
    values = raw_cpue * gear_factor,
    method = method,
    gear_factor = gear_factor,
    n_records = length(catch)
  )
}

Now our print method can be more informative:

#' @export
print.cpue_result <- function(x, ...) {
  cat("CPUE Result\n")
  cat("Records:     ", attr(x, "n_records"), "\n")
  cat("Method:      ", attr(x, "method"), "\n")
  cat("Gear factor: ", attr(x, "gear_factor"), "\n")
  cat("Values:      ", round(x, 2), "\n")
  invisible(x)
}
load_all()

result <- cpue(
  catch = c(100, 200, 300, 150),
  effort = c(10, 20, 15, 30)
)

result

summary method

summary should compute and return a useful statistical summary:

#' @export
summary.cpue_result <- function(object, ...) {
  cat("Survey Result Summary\n")
  cat("---------------------\n")
  cat("Method:      ", attr(object, "method"), "\n")
  cat("Records:     ", attr(object, "n_records"), "\n")
  cat("Gear factor: ", attr(object, "gear_factor"), "\n")
  cat("Mean CPUE:   ", round(mean(object), 2), "\n")
  cat("Median CPUE: ", round(stats::median(object), 2), "\n")
  cat("SD CPUE:     ", round(stats::sd(object), 2), "\n")
  invisible(object)
}
document()
load_all()

summary(result)

Your turn: Make a plot() method

Plot methods must be named plot.<classname>, take x as the first argument, and include ... in the signature.

You don’t need to use base plot() in a plot method - you can use ggplot2 or any other plotting system. First, add ggplot2 to your package dependencies:

#' @export
plot.cpue_result <- function(x, ...) {
  plot(
    seq_along(x),
    x,
    type = "b",
    xlab = "Record",
    ylab = "CPUE",
    main = paste("CPUE -", attr(x, "method"), "method"),
    ...
  )
}
document()
load_all()

plot(result)
use_package("ggplot2")
#' @export
plot.cpue_result <- function(x, ...) {
  data <- data.frame(
    record = seq_along(x),
    cpue = as.numeric(x)
  )

  ggplot2::ggplot(data, ggplot2::aes(x = record, y = cpue)) +
    ggplot2::geom_line() +
    ggplot2::geom_point() +
    ggplot2::labs(
      x = "Record",
      y = "CPUE",
      title = paste("CPUE -", attr(x, "method"), "method")
    )
}
document()
load_all()

plot(result)

plot(result) +
  ggplot2::theme_minimal() +
  ggplot2::geom_hline(yintercept = mean(result), linetype = "dashed")

Registering S3 methods

For your methods to work when the package is installed (not just with load_all()), they need to appear in the NAMESPACE file. The @export tag on each method handles this via roxygen2.

document()

Look at the NAMESPACE file. What do you see different about S3 methods compared to regular functions?

S3method(print,cpue_result)
S3method(summary,cpue_result)
S3method(plot,cpue_result)

Testing the class

We should update our tests to verify that cpue() returns the right class. testthat provides expect_s3_class() for exactly this:

test_that("cpue() returns a cpue_result object", {
  result <- cpue(c(100, 200), c(10, 20))
  expect_s3_class(result, "cpue_result")
})

We can also test that the metadata attributes are set correctly:

test_that("cpue_result carries calculation metadata", {
  result <- cpue(c(100, 200, 300), c(10, 20, 15), method = "log")
  expect_equal(attr(result, "method"), "log")
  expect_equal(attr(result, "gear_factor"), 1)
  expect_equal(attr(result, "n_records"), 3)
})

And snapshot tests are a natural fit for print methods, since they capture the exact console output:

test_that("print.cpue_result displays expected output", {
  result <- cpue(c(100, 200, 300), c(10, 20, 15))
  expect_snapshot(print(result))
})
TipMake a commit

Creating your own generics and methods

So far we have built new classes and created methods for existing generics. But one of the most practical uses of S3 is creating your own generics, so the same function name works on different types of input.

Our cpue() function currently takes numeric vectors:

cpue(catch = c(100, 200), effort = c(10, 20))

But users often have their data in a data frame. Instead of making them extract columns every time, we can make cpue() work directly on data frames too. The same function call adapts its behaviour depending on what the user passes in - just like summary() does in base R.

Step 1: Make cpue a generic

We replace the function body with UseMethod("cpue"). The first argument becomes x by convention, since it could be either a numeric vector or a data frame:

#' Calculate Catch Per Unit Effort (CPUE)
#'
#' Calculates CPUE from catch and effort data, with optional gear
#' standardization. Supports ratio and log-transformed methods.
#'
#' @param catch Input data: a numeric vector of catch values, or a data frame
#'   containing catch and effort columns.
#' @param ... Additional arguments passed to methods.
#'
#' @export
cpue <- function(catch, ...) {
  UseMethod("cpue")
}

Step 2: Move the existing implementation into cpue.numeric

The existing code becomes the numeric method. The first argument changes from catch to x to match the generic’s signature, but the logic is identical:

#' @rdname cpue
#'
#' @param effort Numeric vector of effort (e.g., hours)
#' @param gear_factor Numeric scalar for gear standardization (default 1)
#' @param method Character; one of `"ratio"` (default) or `"log"`.
#' @param verbose Logical; print processing info? Default from
#'   `getOption("fishr.verbose", FALSE)`.
#'
#' @return A numeric vector of CPUE values
#' @export
#'
#' @examples
#' cpue(100, 10)
#' cpue(c(100, 200), c(10, 20), method = "log")
cpue.numeric <- function(
  catch,
  effort,
  gear_factor = 1,
  method = c("ratio", "log"),
  verbose = getOption("fishr.verbose", FALSE),
  ...
) {
  method <- match.arg(method)

  validate_numeric_inputs(catch = catch, effort = effort)

  if (verbose) {
    message("Processing ", length(catch), " records using ", method, " method")
  }

  raw_cpue <- switch(
    method,
    ratio = catch / effort,
    log = log(catch / effort)
  )

  new_cpue_result(
    values = raw_cpue * gear_factor,
    method = method,
    gear_factor = gear_factor,
    n_records = length(catch)
  )
}

All existing code that calls cpue() with positional arguments continues to work - R dispatches to cpue.numeric because the first argument is numeric.

Step 3: Add a data frame method

#' @rdname cpue
#' @export
cpue.data.frame <- function(
  catch,
  gear_factor = 1,
  method = c("ratio", "log"),
  verbose = getOption("fishr.verbose", FALSE),
  ...
) {
  if (!"catch" %in% names(catch)) {
    stop("Column 'catch' not found in data frame.", call. = FALSE)
  }
  if (!"effort" %in% names(catch)) {
    stop("Column 'effort' not found in data frame.", call. = FALSE)
  }

  # We can then call the numeric method by extracting the relevant columns and passing them to cpue() again.
  # This way we reuse the existing logic and maintain a single source of truth for the CPUE calculation.
  cpue(
    catch = catch[["catch"]],
    effort = catch[["effort"]],
    gear_factor = gear_factor,
    method = method,
    verbose = verbose,
    ...
  )
}

The data frame method extracts the right columns and calls cpue() again with numeric vectors, which dispatches to cpue.numeric. This layering - where one method calls the generic again with different input - is a common and effective pattern.

Step 4: Add a .default method

It’s good practice to add a default method that throws an informative error if the user passes in an unsupported type. If the generic can’t find a method for the class of catch, it will fall back to cpue.default:

#' @rdname cpue
#' @export
cpue.default <- function(catch, ...) {
  stop("Unsupported input type for cpue(): ", class(catch), call. = FALSE)
}

Update biomass_index documentation

  • Update biomass_index to inherit parameters from cpue.numeric instead, since that is now the method it actually calls.
  • cpue.numeric defines catch as a vector OR data.frame, but biomass_index expects a numeric vector
    • add @param catch back explicitly:
#' Calculate Biomass Index
#'
#' @param cpue Numeric vector of CPUE values. If NULL, computed from `catch`
#'   and `effort`.
#' @param area_swept Numeric vector of area swept (e.g., km²).
#' @param catch Numeric vector of catch (e.g., kg).
#' @inheritParams cpue.numeric
#' @inheritDotParams cpue.numeric -effort
#' @export
biomass_index <- function(
  cpue = NULL,
  area_swept,
  catch = NULL,
  effort = NULL,
  ...
) {
  # ...
}
document()
load_all()

# Still works with vectors
cpue(c(100, 200), c(10, 20))

# Now also works with data frames
fishing_data <- data.frame(
  catch = c(100, 200, 300),
  effort = c(10, 20, 15)
)

cpue(fishing_data)

Updating and adding tests

cpue() now returns a cpue_result object, not a plain numeric. expect_equal() compares attributes, so tests that compare the result to a bare number will fail. We have a few options to fix this:

  • Wrap the result in as.numeric() for numeric comparisons:

    • expect_equal(as.numeric(cpue(100, 10)), 10).
  • Use the ignore_attr = TRUE argument in expect_equal() to ignore attributes:

    • expect_equal(cpue(100, 10), 10, ignore_attr = TRUE).
  • Write a helper function in tests/testthat/helper.R to use in our tests::

    expect_equal_numeric <- function(object, expected, ...) {
      expect_equal(as.numeric(object), expected, ...)
    }

    Then use expect_equal_numeric(cpue(100, 10), 10) in your tests.

Some existing snapshots may also be stale since the print format changed. Run snapshot_accept() after check() to accept the updated snapshots.

Now add tests for the data frame method and the default error:

test_that("cpue.data.frame dispatches correctly", {
  fishing_data <- data.frame(
    catch = c(100, 200, 300),
    effort = c(10, 20, 15)
  )
  result <- cpue(fishing_data)
  expect_s3_class(result, "cpue_result")
  expect_equal(as.numeric(result), c(10, 10, 20))
})

test_that("cpue.data.frame errors on missing columns", {
  df <- data.frame(x = 1, y = 2)
  expect_snapshot(cpue(df), error = TRUE)
})

test_that("cpue.default gives informative error", {
  expect_snapshot(cpue("not valid"), error = TRUE)
})
TipMake a commit

When to Use OOP in Packages

S3 classes are worth reaching for when:

  • Your function returns complex results that benefit from custom print/summary/plot methods
  • You want other packages to be able to extend your work with new methods
  • You want to create a consistent interface when your users want to do the same things with different types of input (e.g., numeric vectors, data frames, model fits)

They are probably not needed when:

  • A simple named list or data frame communicates the result clearly
  • There is only one way to display or summarize the output
  • The function returns a scalar or vector with obvious meaning

Recap

  • R has multiple OOP systems (S3, S4, R6) - S3 is the most common and covers most use cases
  • S3 objects are regular R objects with a class attribute
  • The same function (print, summary, plot) does different things for different classes - the object determines the behaviour, not the caller
  • Constructors (new_*) build valid objects
  • Write print, summary, and plot methods to make your classes pleasant to work with
  • Create custom generics with UseMethod() for domain-specific operations
  • Test classes with expect_s3_class() and snapshot tests for print output