Advanced R Package Development
match.arg...)From the Tidyverse design guide:
snake_case)Note: Sometimes you have exceptions. cpue is a well-established domain abbreviation.
Consistent style makes code easier to read.
The air formatter (hopefully already configured in this project) handles this automatically on save.
Our functions (mostly) follow these conventions:
cpue() - short, but a domain-standard abbreviation well understood by fisheries scientistsbiomass_index() - descriptive, easy to predictDomain-standard abbreviations are an example of an exception to be verbose.
Pure function:
Side effects are sometimes necessary, but prefer pure functions when possible. Easier to test and reason about.
What is an example of a commonly used function that uses a side effect?
The calculation itself is pure. The message() call is a side effect.
R has a global key-value store that packages can use for user-configurable defaults:
By convention, package options use the package name as a prefix to avoid conflicts: fishr.verbose, not just verbose.
Users can set options(fishr.verbose = TRUE) once and all fishr functions pick it up. They can still override per call: cpue(100, 10, verbose = TRUE).
Return values and let the caller decide what to do with them.
cpue <- function(
catch,
effort,
gear_factor = 1,
method = c("ratio", "log"),
verbose = getOption("fishr.verbose", FALSE)
) {
method <- match.arg(method)
if (verbose) {
message("Processing ", length(catch), " records using ", method, " method")
}
raw_cpue <- switch(
method,
ratio = catch / effort,
log = log(catch / effort)
)
raw_cpue * gear_factor
}match.arg() takes the first element of the default vector when the user doesn’t supply a value, and gives a clear error for invalid input:
The vector in the default c("ratio", "log") is both the documentation and the validation list.
The ellipsis lets a function accept extra arguments and pass them to another function:
cpue = NULL signals “not provided” clearly:
is.null() is a reliable check for “argument not given”cpue = 0, there’s no way to tell whether the user passed zero intentionally or just didn’t provide a value... must be named - positional matching doesn’t work through the ellipsisbiomass_index(area_swept = 5, catch = 100, effort = 10, mthod = "log") won’t errorSolution: rlang::check_dots_used() catches unused dots before they cause confusion.
Adds rlang to Imports in DESCRIPTION. Call functions using :::
:: is explicit, avoids namespace conflicts, and makes clear which package each function comes from.
biomass_index <- function(
cpue = NULL,
area_swept,
catch = NULL,
effort = NULL,
...
) {
rlang::check_dots_used()
if (is.null(cpue) && (!is.null(catch) && !is.null(effort))) {
cpue <- cpue(catch, effort, ...)
}
if (is.null(cpue)) {
stop("Must provide either 'cpue' or both 'catch' and 'effort'.")
}
cpue * area_swept
}load_all()
# Valid: method is passed through to cpue()
biomass_index(area_swept = 5, catch = 100, effort = 10, method = "log")
# Typo: now caught immediately instead of silently ignored
biomass_index(area_swept = 5, catch = 100, effort = 10, mthod = "log")
#> Error: In `biomass_index()`, argument `mthod` is not used.Problems:
Small, focused functions that do one thing well. Easier to test, read, and reuse.
Look for:
If a function exceeds ~20-30 lines, consider whether it is doing too many things.
Unhelpful - doesn’t say which argument is the problem or what was expected.
Instead of copy-pasting the same check into every function, create a shared helper.
@noRd tells roxygen2 not to generate a .Rd file - internal helpers don’t need public docs.
cpue <- function(
catch,
effort,
gear_factor = 1,
method = c("ratio", "log"),
verbose = getOption("fishr.verbose", FALSE)
) {
method <- match.arg(method)
validate_numeric_inputs(catch = catch, effort = effort)
if (verbose) {
message("Processing ", length(catch), " records using ", method, " method")
}
raw_cpue <- switch(method, ratio = catch / effort, log = log(catch / effort))
raw_cpue * gear_factor
}R/utils.R - when the helper is used across multiple functions in the package.Good rule of thumb: start with the helper in the same file. Move it to utils.R only when a second function needs it.
load_all()
# Step by step
my_cpue <- cpue(catch = c(100, 200, 300), effort = c(10, 20, 30))
biomass_index(cpue = my_cpue, area_swept = 50)
# Or in one call, thanks to ... pass-through
biomass_index(area_swept = 50, catch = c(100, 200, 300), effort = c(10, 20, 30))
# With options passed through
biomass_index(
area_swept = 50,
catch = c(100, 200, 300),
effort = c(10, 20, 30),
method = "log"
)match.arg - consistent interfaces reduce mistakes... - pass-through enables flexible compositionAdd verbose support to biomass_index() using the same options pattern as cpue():
verbose = getOption("fishr.verbose", FALSE) as an argumentverbose = TRUE, print a message reporting how many records are being processeddocument() and check()Bonus: write a test that confirms the message appears when verbose = TRUE and is silent by default.
Add the @param and argument to R/biomass.R:
#' @param verbose Logical; print processing info? Default from
#' `getOption("fishr.verbose", FALSE)`.
biomass_index <- function(
cpue = NULL,
area_swept,
catch = NULL,
effort = NULL,
verbose = getOption("fishr.verbose", default = FALSE),
...
) {
rlang::check_dots_used()
if (is.null(cpue) && (!is.null(catch) && !is.null(effort))) {
cpue <- cpue(catch, effort, verbose = verbose, ...)
}
if (is.null(cpue)) {
stop("Must provide either 'cpue' or both 'catch' and 'effort'.")
}
validate_numeric_inputs(cpue = cpue, area_swept = area_swept)
if (verbose) {
message("calculating biomass index for ", length(area_swept), " records")
}
cpue * area_swept
}