Object-Oriented Programming in R

Outline

  1. What is Object-Oriented Programming and why does R have it?
  2. OOP systems in R
  3. S3 in depth: classes, attributes, and methods
  4. Writing print, summary, and plot methods
  5. Creating your own generics

What is Object-Oriented Programming?

You already use it

x_norm <- rnorm(5000)
summary(x_norm)
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
-3.312878 -0.679415  0.006581  0.003209  0.684536  3.689426 


summary(mtcars)
      mpg             cyl             disp             hp       
 Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
 1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
 Median :19.20   Median :6.000   Median :196.3   Median :123.0  
 Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
 3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
 Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
      drat             wt             qsec             vs        
 Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
 1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
 Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
 Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
 3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
 Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
       am              gear            carb      
 Min.   :0.0000   Min.   :3.000   Min.   :1.000  
 1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
 Median :0.0000   Median :4.000   Median :2.000  
 Mean   :0.4062   Mean   :3.688   Mean   :2.812  
 3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
 Max.   :1.0000   Max.   :5.000   Max.   :8.000  


mtcars_model <- lm(mpg ~ wt, data = mtcars)
summary(mtcars_model)

Call:
lm(formula = mpg ~ wt, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.5432 -2.3647 -0.1252  1.4096  6.8727 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
wt           -5.3445     0.5591  -9.559 1.29e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.046 on 30 degrees of freedom
Multiple R-squared:  0.7528,    Adjusted R-squared:  0.7446 
F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

What makes that possible?

The same function name. Completely different behaviour. No if/else in your code.

Every R object has a class attribute:

class(x_norm)
#> "numeric"

class(mtcars)
#> "data.frame"

class(mtcars_model)
#> "lm"

When you call summary(x), R reads class(x) and looks for a function named summary.<class>.

This is called dispatch: the object’s class determines which function runs.

Generics and methods

A generic is the user-facing function:

summary
function (object, ...) UseMethod("summary")

It does almost nothing itself - it just dispatches.

A method is a class-specific implementation:

summary.lm
summary.data.frame
summary.default # fallback

Named <generic>.<class>.

The generic is the consistent interface. The methods are the specialised implementations.

The dispatch chain

When you call summary(x) and class(x) is "foo":

  1. R looks for summary.foo
  2. If not found, looks for summary.default
  3. If neither exists - error

This is single dispatch: the method is chosen based on the class of the first argument only.

# What methods exist for a generic?
methods(generic.function = "print")

# What methods exist for a class?
methods(class = "data.frame")

OOP Systems in R

Four main OOP systems

All four implement the same idea - dispatch based on class - but differ in formality and where methods live.

System Dispatch Formality Common in
S3 Single (first arg) Minimal Base R, tidyverse, most packages
S4 Multiple args Formal classes, validity checks Bioconductor
S7 Multiple args Formal, successor to S3 + S4 Some newer packages
R6 Single Encapsulated, mutable Shiny, databases, stateful objects

S3 covers the vast majority of package development use cases. We focus on S3 today.

S3 in Depth

An S3 object is just a regular R object

…with a class attribute attached.

x <- c(10, 20, 30)
class(x)
#> "numeric"

class(x) <- "my_class"
class(x)
#> "my_class"

# The data is unchanged
x + 1 # still works as a numeric vector
  • That’s the whole mechanism with S3.
  • Other class systems have a more formal way of declaring classes.

Giving cpue() a class

Currently cpue() returns a plain numeric vector. We can give it a class:

cpue <- function(
  catch,
  effort,
  gear_factor = 1,
  method = c("ratio", "log"),
  ...
) {
  # ... existing calculation ...

  result <- raw_cpue * gear_factor
  class(result) <- "cpue_result" # tag the output
  result # return the result
}


r <- cpue(c(100, 200, 300), c(10, 20, 15))
class(r) # "cpue_result"

Writing New Methods for Existing Generics

The print method

When you type an object name at the console, R calls print() - this is autoprinting.

  • A special generic that is called implicitly when an object is returned at the console.

Generic Signature: print(x, ...) . . .

#' @export
print.cpue_result <- function(x, ...) {
  cat("CPUE Results for", length(x), "records\n")
  cat("Values:", round(x, 2), "\n")
  invisible(x)
}
  • Name it print.<classname>
  • First argument is x
  • Always include ... in the signature - the print generic requires it (See ?print)
  • Don’t modify x; return invisible(x) so assignment still works: y <- print(x)
  • Use cat() inside, not print() - calling print() inside print.foo causes infinite recursion
  • Use @export so roxygen2 registers it in NAMESPACE
document()
load_all()

result

Add metadata to class with attributes

A class tag alone gives us dispatch. Attributes let us attach context to the object.

attr(result, "method") <- method
attr(result, "gear_factor") <- gear_factor
attr(result, "n_records") <- length(catch)
class(result) <- "cpue_result"

Attributes can be read back with attr() or attributes():

attr(result, "method")
#> "ratio"
attr(result, "n_records")
#> 3
attributes(result)
#> $method
#> [1] "ratio"
#> $gear_factor
#> [1] 1
#> $n_records
#> [1] 3
#> $class
#> [1] "cpue_result"

Under the hood it is still a numeric vector

typeof(result)
#> "double"

result + 1
#> [1] 11 21 31

Using a constructor function

  • Setting attributes one by one gets messy.
  • The standard practice is a constructor function
  • Use structure() to set all attributes in one call
new_cpue_result <- function(values, method, gear_factor, n_records) {
  structure(
    values, # the object
    # metadata as named attributes
    method = method,
    gear_factor = gear_factor,
    n_records = n_records,
    class = "cpue_result"
  )
}

Convention: name constructors new_<classname>. They are usually internal (@noRd).

Use the constructor in your main function

Then cpue() calls the constructor instead of setting attributes directly:

new_cpue_result(
  values = raw_cpue * gear_factor,
  method = method,
  gear_factor = gear_factor,
  n_records = length(catch)
)

Improving the print method

#' @export
print.cpue_result <- function(x, ...) {
  cat("CPUE Result\n")
  cat("Records:     ", attr(x, "n_records"), "\n")
  cat("Method:      ", attr(x, "method"), "\n")
  cat("Gear factor: ", attr(x, "gear_factor"), "\n")
  cat("Values:      ", round(x, 2), "\n")
  invisible(x)
}

The summary method

summary should print a useful statistical summary:

Generic Signature: summary(object, ...)

summary.cpue_result <- function(object, ...) {
  cat("Survey Result Summary\n")
  cat("---------------------\n")
  cat("Method:      ", attr(object, "method"), "\n")
  cat("Records:     ", attr(object, "n_records"), "\n")
  cat("Gear factor: ", attr(object, "gear_factor"), "\n")
  cat("Mean CPUE:   ", round(mean(object), 2), "\n")
  cat("Median CPUE: ", round(stats::median(object), 2), "\n")
  cat("SD CPUE:     ", round(stats::sd(object), 2), "\n")
  invisible(object)
}

Your turn: Make a plot() method

  • Plot methods must be named plot.<classname>, take x as the first argument, and include ... in the signature.
  • Note that you don’t need to use base plot() in a plot method - you can use ggplot2 or any other plotting system.

Generic Signature: plot(x, ...)

#' @export
plot.cpue_result <- function(x, ...) {
  plot(
    seq_along(x),
    x,
    type = "b",
    xlab = "Record",
    ylab = "CPUE",
    main = paste("CPUE -", attr(x, "method"), "method"),
    ...
  )
}

The ... passes through to plot.default - users can customise colour, line type, etc. without us anticipating every option.

Registering methods in NAMESPACE

@export on each method tells roxygen2 to write the correct entry:

document()


# NAMESPACE entries
S3method(print,cpue_result)
S3method(summary,cpue_result)
S3method(plot,cpue_result)


S3method() entries - not regular export() entries - are how R finds your methods when the package is installed.

We are going to have some test failures…

  • cpue() now returns a cpue_result object, not a plain numeric.
  • expect_equal() compares attributes, so tests that compare the result to a bare number will fail.

We have a few options to fix this:

  • Wrap the result in as.numeric() for numeric comparisons:

    • expect_equal(as.numeric(cpue(100, 10)), 10).
  • Use the ignore_attr = TRUE argument in expect_equal() to ignore attributes:

    • expect_equal(cpue(100, 10), 10, ignore_attr = TRUE).
  • Write a helper function in tests/testthat/helper.R to use in our tests:

    expect_equal_numbers <- function(object, expected, ...) {
      expect_equal(object, expected, ignore_attr = TRUE, ...)
    }

    Then use expect_equal_numbers(cpue(100, 10), 10) in your tests.

Testing S3 classes

expect_s3_class() checks the class:

test_that("cpue() returns a cpue_result object", {
  result <- cpue(c(100, 200), c(10, 20))
  expect_s3_class(result, "cpue_result")
})

Test that attributes are set correctly:

test_that("cpue_result carries calculation metadata", {
  result <- cpue(c(100, 200, 300), c(10, 20, 15), method = "log")
  expect_equal(attr(result, "method"), "log")
  expect_equal(attr(result, "n_records"), 3)
})

Snapshot tests are a natural fit for print methods:

test_that("print.cpue_result displays expected output", {
  result <- cpue(c(100, 200, 300), c(10, 20, 15))
  expect_snapshot(print(result))
})

Make a commit

document()
check()

Commit your class definition, constructor, methods, and tests.

Creating Your Own Generics

The motivation

cpue() currently takes two numeric vectors. But users often have a data frame

# What they have to write now:
cpue(fishing_data$catch, fishing_data$effort)


What would be nicer:

cpue(fishing_data)


We can make cpue() work on both by writing S3 methods for numeric and data.frame inputs.

Step 1: Make cpue() a generic

Replace the function body with UseMethod():

#' @export
cpue <- function(catch, ...) {
  UseMethod("cpue")
}

UseMethod("cpue") tells R: look at class(catch) and dispatch to cpue.<class>.

Step 2: The numeric method

Move the existing implementation into cpue.numeric:

#' @rdname cpue
#' @export
cpue.numeric <- function(
  catch,
  effort,
  gear_factor = 1,
  method = c("ratio", "log"),
  verbose = getOption("fishr.verbose", FALSE),
  ...
) {
  # ... same logic as before ...
  new_cpue_result(raw_cpue * gear_factor, method, gear_factor, length(catch))
}

All existing calls to cpue() with a numeric first argument still work - R dispatches to cpue.numeric automatically.

Step 3: The data frame method

#' @rdname cpue
#' @export
cpue.data.frame <- function(
  catch,
  ...
) {
  if (!"catch" %in% names(catch)) {
    stop("Column 'catch' not found.", call. = FALSE)
  }
  if (!"effort" %in% names(catch)) {
    stop("Column 'effort' not found.", call. = FALSE)
  }

  cpue(catch[["catch"]], effort = catch[["effort"]], ...)
}
  • The data frame method extracts columns and calls cpue() again with numeric vectors - which dispatches to cpue.numeric.
  • The calculation logic lives in exactly one place.

Step 4: The default method

#' @rdname cpue
#' @export
cpue.default <- function(catch, ...) {
  stop("Unsupported input type for cpue(): ", class(catch), call. = FALSE)
}

With a default method, users get a clear error message when they call cpue() on a class for which no method is defined.

Using @rdname to group docs

All three methods share one help page:

#' Calculate CPUE
#' @param catch A numeric vector of catch, or a data frame.
#' @param ... Additional arguments passed to methods.
#' @export
cpue <- function(catch, ...) UseMethod("cpue")

#' @rdname cpue
#' ... same @param tags as before ...
#' @export
cpue.numeric <- function(catch, effort, ...) {
  ...
}

#' @rdname cpue
#' @export
cpue.data.frame <- function(catch, ...) {
  ...
}

@rdname cpue tells roxygen2 to add documentation to the cpue help page rather than create a new one.

It works

load_all()

# Vector - dispatches to cpue.numeric
cpue(c(100, 200, 300), c(10, 20, 15))

# Data frame - dispatches to cpue.data.frame
fishing_data <- data.frame(catch = c(100, 200), effort = c(10, 20))
cpue(fishing_data)

One more thing: update biomass_index() documentation

  • Update biomass_index to inherit parameters from cpue.numeric instead, since that is now the method it actually calls.
  • cpue.numeric defines catch as a vector OR data.frame, but biomass_index expects a numeric vector
    • add @param catch back explicitly:
#' Calculate Biomass Index
#'
#' @param cpue Numeric vector of CPUE values. If NULL, computed from `catch`
#'   and `effort`.
#' @param area_swept Numeric vector of area swept (e.g., km²).
#' @param catch Numeric vector of catch (e.g., kg).
#' @inheritParams cpue.numeric
#' @inheritDotParams cpue.numeric -effort
#' @export
biomass_index <- function(

Make a commit

document()
check()
snapshot_accept() # print format changed - accept updated snapshots

Commit the generic, methods, updated documentation, and tests.

When to Use OOP

S3 is worth it when…

  • Your function returns complex results that benefit from custom print/summary/plot
  • You want the same function name to work on multiple input types (vectors, data frames, model objects)
  • You want other packages to be able to extend your work by writing new methods

Recap

  • S3 objects are regular R objects with a class attribute
  • Generics dispatch to generic.class based on the class of the first argument
  • Constructors (new_<classname>) centralise object creation with structure()
  • print, summary, and plot methods make your classes pleasant to work with
  • Create your own generics with UseMethod() for consistent interfaces across input types
  • @export on each method registers it in NAMESPACE via S3method()
  • Test classes with expect_s3_class() and snapshot tests for print output