x_norm <- rnorm(5000)
plot(x_norm)
summary(mtcars)
plot(mtcars)
plot(mtcars$mpg, mtcars$hp)
mtcars_model <- lm(mpg ~ wt, data = mtcars)
summary(mtcars_model)
plot(mtcars_model)Object-Oriented Programming in R
Introduction
You already use object-oriented programming (OOP) patterns constantly in R, even if you have not thought about it explicitly.
Let’s look at the plot() and summary() functions.
The same function - summary() - does something completely different depending on what you hand it. A numeric vector gets a five-number summary. A data frame gets column summaries. A linear model fit gets coefficients, standard errors, and p-values. The function name stays the same; the behaviour adapts to the data.
This is one of the most powerful ideas in OOP: you write code that calls print() or plot() or summary(), and the object itself determines what happens.
We are going to:
- Create a new S3 class
cpue_resultto wrap CPUE calculations with metadata - Write
print,summary, andplotmethods for it - Convert
cpue()andbiomass_index()into generics that work with both numeric vectors and data frames
OOP Systems in R
R has several OOP systems. They differ in formality and where methods live.
| System | Dispatch | Methods live on… | Formality | Common in |
|---|---|---|---|---|
| S3 | Single (first argument) | The generic function | Informal | Base R, tidyverse, most packages |
| S4 | Multiple arguments | The generic function | Formal (classes, validity) | Bioconductor, methods-heavy packages |
| S7 | Multiple arguments | The generic function | Formal (classes, properties, validation) | Some new packages (successor to S3 and S4) |
| R6 | Single | The object itself | Moderate (via R6 package) | Mutable state, encapsulation |
For package development, S3 covers the vast majority of use cases. S4 is worth learning if you work in Bioconductor or need multiple dispatch. S7 is designed as the eventual successor to both S3 and S4 - it combines the simplicity of S3 with more of the rigor of S4 (formal class definitions, property validation, multiple dispatch). If you are starting a new package and want more structure than S3 provides, S7 is worth considering, though S3 remains the dominant system in practice.
We will focus on S3 today, with a brief look at R6 at the end.
S3 in Depth
S3 is the most widely used OOP system in R. It is simple, flexible, and what base R itself uses.
How S3 works
An S3 object is just a regular R object (list, vector, data frame, etc.) with a class attribute. When you call a generic function like print(x), R looks at class(x) and dispatches to the appropriate method.
class(x_norm)
class(mtcars)
class(mtcars_model)Seeing the dispatch chain
# What methods exist for print?
methods(generic.function = "print")
# What methods exist for a class?
methods(class = "data.frame")The dispatch mechanism
When you call summary(x) where x has class "foo":
- R looks for
summary.foo - If not found, looks for
summary.default - If neither exists, throws an error
This is called single dispatch - the method is chosen based on the class of the first argument only.
Creating an S3 Class
Our cpue() function returns a plain numeric vector:
result <- cpue(c(100, 200, 300), c(10, 20, 15))
class(result)That class is just "numeric". When we print(result), R dispatches to print.numeric and shows raw numbers. What if we want CPUE output to identify itself - to carry context about the calculation and display it in a useful way?
The simplest way: class() <-
The most direct way to create an S3 class is to set the class attribute on an existing object. Let’s modify cpue() to tag its output. Add two lines at the end of the function body in R/cpue.R:
cpue <- function(
catch,
effort,
gear_factor = 1,
method = c("ratio", "log"),
verbose = getOption("fishr.verbose", FALSE)
) {
method <- match.arg(method)
validate_numeric_inputs(catch = catch, effort = effort)
if (verbose) {
message("Processing ", length(catch), " records using ", method, " method")
}
raw_cpue <- switch(
method,
ratio = catch / effort,
log = log(catch / effort)
)
result <- raw_cpue * gear_factor
class(result) <- "cpue_result"
result
}load_all()
result <- cpue(c(100, 200, 300), c(10, 20, 15))
result
class(result)Now R knows this is a cpue_result. We can write a print method for it.
Writing Methods
print method
The print method controls what users see when they type an object name at the console. When you type result and hit Enter, R calls print(result) behind the scenes - this is called autoprinting. R looks at the class of the object, finds print.cpue_result, and calls that. Without a custom print method, R falls back to print.default, which is why our tagged vector currently displays as raw numbers.
#' @export
print.cpue_result <- function(x, ...) {
cat("CPUE Results for", length(x), "records\n")
cat("Values:", round(x, 2), "\n")
invisible(x)
}Important details:
- The function name must be
print.<classname> - Always include
...in the signature (required by theprintgeneric) - Return
invisible(x)so assignment still works:y <- print(x) - Use
cat()rather thanprint()inside print methods to avoid recursion - Use
@exportto register the method in the NAMESPACE file
document()
load_all()
resultThat is the entire mechanism behind S3 dispatch: set a class, write a function.classname method, and R connects the two automatically.
Note that although result looks different when we print it, it is still a numeric vector under the hood:
is.numeric(result)
unclass(result)Adding metadata with attributes
A bare numeric vector with a class label has no context. What gear factor was used? What method was used? R objects can carry arbitrary attributes - named metadata attached to the object. Let’s update the function body again:
cpue <- function(
catch,
effort,
gear_factor = 1,
method = c("ratio", "log"),
verbose = getOption("fishr.verbose", FALSE)
) {
method <- match.arg(method)
validate_numeric_inputs(catch = catch, effort = effort)
if (verbose) {
message("Processing ", length(catch), " records using ", method, " method")
}
raw_cpue <- switch(
method,
ratio = catch / effort,
log = log(catch / effort)
)
result <- raw_cpue * gear_factor
attr(result, "gear_factor") <- gear_factor
attr(result, "n_records") <- length(catch)
attr(result, "method") <- method
class(result) <- "cpue_result"
result
}load_all()
result <- cpue(c(100, 200, 300), c(10, 20, 15))
attr(result, "gear_factor")
attr(result, "n_records")
attr(result, "method")As a class accumulates more fields, setting attributes one by one gets unwieldy.
Create a constructor function
For anything beyond a very basic class, the standard practice is to create a constructor function that builds the object in one place. A common naming convention, recommended in Advanced R, is new_<classname>.
#' @noRd
new_cpue_result <- function(values, method, gear_factor, n_records) {
structure(
values, # The data
method = method, # Attributes specifying metadata
gear_factor = gear_factor,
n_records = n_records,
class = "cpue_result" # class is a special attribute
)
}A few things changed:
structure()sets multiple attributes at once - includingclass- instead of separateattr()andclass() <-calls- All object creation is centralised
#' Summarize a CPUE survey
#'
#' Calculates CPUE from catch and effort data and returns a structured
#' `cpue_result` object with metadata about the calculation.
#'
#' @param catch Numeric vector of catch values.
#' @param effort Numeric vector of effort values.
#' @param gear_factor Numeric gear correction factor (default 1).
#' @param method Calculation method: "ratio" or "log".
#'
#' @return A `cpue_result` object.
#' @export
#'
#' @examples
#' cpue(catch = c(100, 200, 300), effort = c(10, 20, 30))
cpue <- function(
catch,
effort,
gear_factor = 1,
method = c("ratio", "log"),
verbose = getOption("fishr.verbose", FALSE)
) {
# ...cpue body here
new_cpue_result(
values = raw_cpue * gear_factor,
method = method,
gear_factor = gear_factor,
n_records = length(catch)
)
}Now our print method can be more informative:
#' @export
print.cpue_result <- function(x, ...) {
cat("CPUE Result\n")
cat("Records: ", attr(x, "n_records"), "\n")
cat("Method: ", attr(x, "method"), "\n")
cat("Gear factor: ", attr(x, "gear_factor"), "\n")
cat("Values: ", round(x, 2), "\n")
invisible(x)
}load_all()
result <- cpue(
catch = c(100, 200, 300, 150),
effort = c(10, 20, 15, 30)
)
resultsummary method
summary should compute and return a useful statistical summary:
#' @export
summary.cpue_result <- function(object, ...) {
cat("Survey Result Summary\n")
cat("---------------------\n")
cat("Method: ", attr(object, "method"), "\n")
cat("Records: ", attr(object, "n_records"), "\n")
cat("Gear factor: ", attr(object, "gear_factor"), "\n")
cat("Mean CPUE: ", round(mean(object), 2), "\n")
cat("Median CPUE: ", round(stats::median(object), 2), "\n")
cat("SD CPUE: ", round(stats::sd(object), 2), "\n")
invisible(object)
}document()
load_all()
summary(result)Your turn: Make a plot() method
Plot methods must be named plot.<classname>, take x as the first argument, and include ... in the signature.
You don’t need to use base plot() in a plot method - you can use ggplot2 or any other plotting system. First, add ggplot2 to your package dependencies:
#' @export
plot.cpue_result <- function(x, ...) {
plot(
seq_along(x),
x,
type = "b",
xlab = "Record",
ylab = "CPUE",
main = paste("CPUE -", attr(x, "method"), "method"),
...
)
}document()
load_all()
plot(result)use_package("ggplot2")#' @export
plot.cpue_result <- function(x, ...) {
data <- data.frame(
record = seq_along(x),
cpue = as.numeric(x)
)
ggplot2::ggplot(data, ggplot2::aes(x = record, y = cpue)) +
ggplot2::geom_line() +
ggplot2::geom_point() +
ggplot2::labs(
x = "Record",
y = "CPUE",
title = paste("CPUE -", attr(x, "method"), "method")
)
}document()
load_all()
plot(result)
plot(result) +
ggplot2::theme_minimal() +
ggplot2::geom_hline(yintercept = mean(result), linetype = "dashed")Registering S3 methods
For your methods to work when the package is installed (not just with load_all()), they need to appear in the NAMESPACE file. The @export tag on each method handles this via roxygen2.
document()Look at the NAMESPACE file. What do you see different about S3 methods compared to regular functions?
S3method(print,cpue_result)
S3method(summary,cpue_result)
S3method(plot,cpue_result)
Testing the class
We should update our tests to verify that cpue() returns the right class. testthat provides expect_s3_class() for exactly this:
test_that("cpue() returns a cpue_result object", {
result <- cpue(c(100, 200), c(10, 20))
expect_s3_class(result, "cpue_result")
})We can also test that the metadata attributes are set correctly:
test_that("cpue_result carries calculation metadata", {
result <- cpue(c(100, 200, 300), c(10, 20, 15), method = "log")
expect_equal(attr(result, "method"), "log")
expect_equal(attr(result, "gear_factor"), 1)
expect_equal(attr(result, "n_records"), 3)
})And snapshot tests are a natural fit for print methods, since they capture the exact console output:
test_that("print.cpue_result displays expected output", {
result <- cpue(c(100, 200, 300), c(10, 20, 15))
expect_snapshot(print(result))
})Creating your own generics and methods
So far we have built new classes and created methods for existing generics. But one of the most practical uses of S3 is creating your own generics, so the same function name works on different types of input.
Our cpue() function currently takes numeric vectors:
cpue(catch = c(100, 200), effort = c(10, 20))But users often have their data in a data frame. Instead of making them extract columns every time, we can make cpue() work directly on data frames too. The same function call adapts its behaviour depending on what the user passes in - just like summary() does in base R.
Step 1: Make cpue a generic
We replace the function body with UseMethod("cpue"). The first argument becomes x by convention, since it could be either a numeric vector or a data frame:
#' Calculate Catch Per Unit Effort (CPUE)
#'
#' Calculates CPUE from catch and effort data, with optional gear
#' standardization. Supports ratio and log-transformed methods.
#'
#' @param catch Input data: a numeric vector of catch values, or a data frame
#' containing catch and effort columns.
#' @param ... Additional arguments passed to methods.
#'
#' @export
cpue <- function(catch, ...) {
UseMethod("cpue")
}Step 2: Move the existing implementation into cpue.numeric
The existing code becomes the numeric method. The first argument changes from catch to x to match the generic’s signature, but the logic is identical:
#' @rdname cpue
#'
#' @param effort Numeric vector of effort (e.g., hours)
#' @param gear_factor Numeric scalar for gear standardization (default 1)
#' @param method Character; one of `"ratio"` (default) or `"log"`.
#' @param verbose Logical; print processing info? Default from
#' `getOption("fishr.verbose", FALSE)`.
#'
#' @return A numeric vector of CPUE values
#' @export
#'
#' @examples
#' cpue(100, 10)
#' cpue(c(100, 200), c(10, 20), method = "log")
cpue.numeric <- function(
catch,
effort,
gear_factor = 1,
method = c("ratio", "log"),
verbose = getOption("fishr.verbose", FALSE),
...
) {
method <- match.arg(method)
validate_numeric_inputs(catch = catch, effort = effort)
if (verbose) {
message("Processing ", length(catch), " records using ", method, " method")
}
raw_cpue <- switch(
method,
ratio = catch / effort,
log = log(catch / effort)
)
new_cpue_result(
values = raw_cpue * gear_factor,
method = method,
gear_factor = gear_factor,
n_records = length(catch)
)
}All existing code that calls cpue() with positional arguments continues to work - R dispatches to cpue.numeric because the first argument is numeric.
Step 3: Add a data frame method
#' @rdname cpue
#' @export
cpue.data.frame <- function(
catch,
gear_factor = 1,
method = c("ratio", "log"),
verbose = getOption("fishr.verbose", FALSE),
...
) {
if (!"catch" %in% names(catch)) {
stop("Column 'catch' not found in data frame.", call. = FALSE)
}
if (!"effort" %in% names(catch)) {
stop("Column 'effort' not found in data frame.", call. = FALSE)
}
# We can then call the numeric method by extracting the relevant columns and passing them to cpue() again.
# This way we reuse the existing logic and maintain a single source of truth for the CPUE calculation.
cpue(
catch = catch[["catch"]],
effort = catch[["effort"]],
gear_factor = gear_factor,
method = method,
verbose = verbose,
...
)
}The data frame method extracts the right columns and calls cpue() again with numeric vectors, which dispatches to cpue.numeric. This layering - where one method calls the generic again with different input - is a common and effective pattern.
Step 4: Add a .default method
It’s good practice to add a default method that throws an informative error if the user passes in an unsupported type. If the generic can’t find a method for the class of catch, it will fall back to cpue.default:
#' @rdname cpue
#' @export
cpue.default <- function(catch, ...) {
stop("Unsupported input type for cpue(): ", class(catch), call. = FALSE)
}Update biomass_index documentation
- Update
biomass_indexto inherit parameters fromcpue.numericinstead, since that is now the method it actually calls. cpue.numericdefinescatchas a vector OR data.frame, butbiomass_indexexpects a numeric vector- add
@param catchback explicitly:
- add
#' Calculate Biomass Index
#'
#' @param cpue Numeric vector of CPUE values. If NULL, computed from `catch`
#' and `effort`.
#' @param area_swept Numeric vector of area swept (e.g., km²).
#' @param catch Numeric vector of catch (e.g., kg).
#' @inheritParams cpue.numeric
#' @inheritDotParams cpue.numeric -effort
#' @export
biomass_index <- function(
cpue = NULL,
area_swept,
catch = NULL,
effort = NULL,
...
) {
# ...
}document()
load_all()
# Still works with vectors
cpue(c(100, 200), c(10, 20))
# Now also works with data frames
fishing_data <- data.frame(
catch = c(100, 200, 300),
effort = c(10, 20, 15)
)
cpue(fishing_data)Updating and adding tests
cpue() now returns a cpue_result object, not a plain numeric. expect_equal() compares attributes, so tests that compare the result to a bare number will fail. We have a few options to fix this:
Wrap the result in
as.numeric()for numeric comparisons:expect_equal(as.numeric(cpue(100, 10)), 10).
Use the
ignore_attr = TRUEargument inexpect_equal()to ignore attributes:expect_equal(cpue(100, 10), 10, ignore_attr = TRUE).
Write a helper function in
tests/testthat/helper.Rto use in our tests::expect_equal_numeric <- function(object, expected, ...) { expect_equal(as.numeric(object), expected, ...) }Then use
expect_equal_numeric(cpue(100, 10), 10)in your tests.
Some existing snapshots may also be stale since the print format changed. Run snapshot_accept() after check() to accept the updated snapshots.
Now add tests for the data frame method and the default error:
test_that("cpue.data.frame dispatches correctly", {
fishing_data <- data.frame(
catch = c(100, 200, 300),
effort = c(10, 20, 15)
)
result <- cpue(fishing_data)
expect_s3_class(result, "cpue_result")
expect_equal(as.numeric(result), c(10, 10, 20))
})
test_that("cpue.data.frame errors on missing columns", {
df <- data.frame(x = 1, y = 2)
expect_snapshot(cpue(df), error = TRUE)
})
test_that("cpue.default gives informative error", {
expect_snapshot(cpue("not valid"), error = TRUE)
})When to Use OOP in Packages
S3 classes are worth reaching for when:
- Your function returns complex results that benefit from custom
print/summary/plotmethods - You want other packages to be able to extend your work with new methods
- You want to create a consistent interface when your users want to do the same things with different types of input (e.g., numeric vectors, data frames, model fits)
They are probably not needed when:
- A simple named list or data frame communicates the result clearly
- There is only one way to display or summarize the output
- The function returns a scalar or vector with obvious meaning
Recap
- R has multiple OOP systems (S3, S4, R6) - S3 is the most common and covers most use cases
- S3 objects are regular R objects with a
classattribute - The same function (
print,summary,plot) does different things for different classes - the object determines the behaviour, not the caller - Constructors (
new_*) build valid objects - Write
print,summary, andplotmethods to make your classes pleasant to work with - Create custom generics with
UseMethod()for domain-specific operations - Test classes with
expect_s3_class()and snapshot tests for print output