Type safe(r) R code
Introduction to r-lib type safety checks
Type safety is all the rage these days. It’s the (one of the many) reason why people love Rust , TypeScript, and Pydantic.
Knowing what type of data is coming in and going out of a function is critical! It means fewer bugs and more robust code.
I will introduce you to the r-lib standalone checks. Here is a peek of
some code from {arcgisgeocode} that helps make the function more type
safe.
What is type safety?
A type safe language is one where each variable has a known and validated type. R is not type safe.
When you define a function in a type safe language, you have to specify the input types and the output types.
Here is a function that scales one numeric variable by another.
{
x / y
}
This is not type safe. I can pass in a character vector a list, NULL, or even a POSIXct class. Sometimes R will do the appropriate conversions for us. But other times it wont.
Error in x/y: non-numeric argument to binary operator
You want to be in control of your function!
Why type safety is important
Type safety allows us to catch and prevent errors early and thus prevent unintended bugs. Without type safety, R may perform silent coercions or your code may run as R intended—but not as you intended.
💡 A type coercion is a type conversion that occurs because one type
does not match the other and is done silently. Casting is when you
explicitly change the type—e.g. calling as.integer() on doubles()
Adding type guarantees ensures that your code functions as intended.
Type safety in other languages
Type safety is becoming an increasingly common and more important aspect of programming. People love Rust for its type safety among other things. Rust (and C/++ and Java and Scala etc) is a statically typed language.
💡 A statically typed language requires you to specify the type of object that are used in a function and elsewhere.
Rust’s static typing
In Rust, you define a type and that type is unique.
To create a person you would write
Person { name: "Josiah".to_string(), age: 28 } . This is recognized as
a Person struct. In Rust, a function must know its argument types, for
example:
This function takes a reference to a Person and calculates (roughly)
what year they were born in. If I had another struct called Me with
the same exact fields, this wouldn’t work.
Even though Me and Person have the exact same field types, they are
recognized as different types.
This is different than how JavaScript does this.
TypeScript Interfaces
The JavaScript folks now have TypeScript which is pseudo-type safety. TypeScript uses duck typing.
💡 If it looks like a duck, swims like a duck, and quacks like a duck, this it probably is a duck.
If I understand TypeScript correctly, they use a type interface. These feel similar to struct definitions in Rust.
In TypeScript, these interfaces are a way to standardizes what a type looks like. But not an actual type themself! This is (I think), the equivalent JavaScript code to calculate the birth year of an individual.
With this, though, you don’t actually need to have an instance of
Person . Instead, you can have a normal JavaScript object that looks
(and quacks) just like the Person type.
john;
jane;
These both work.
Type safety in R
Like JavaScript, and Python (yes I know about type hinting, thats opt in and different), R doesn’t do any validation of arguments. TypeScript can add a layer of Duck Typing checks to the functions which is great for them. But what about us?
How can we make our R functions safer? In R, (almost) everything is a
vector. The r-lib team has (very quietly) created what I think is the
greatest contribution to the tidyverse ecosystem in a long time in the
form of standalone type check functions.
Stand-alone type checks
The standalone functions are quite unique. I’ve never seen anything quite like them. They’re literally standalone R files with a bunch of handy R functions. It’s like adding a package but without adding it as a dependency.
These are functions prefixed with check_ that test inputs for the most
common types. They provide beautiful error messages and have commonly
needed flexibility.
Add type checks to your project
The usethis package has a handy function use_standalone() which will
add these functions for you.
usethis::
This is supposed to be used in the context of an R package but can still be used in any R script. THe function requires an R directory to be found at the root.
Standalone type checks
We can get really far in enhancing type safety
https://usethis.r-lib.org/reference/use_standalone.html
Since this isn’t an R package, I will source the functions. Otherwise,
run devtools::load_all() for the functions to become available.
[1] "/Users/josiahparry/github/misc/blog-zola/content/posts/2024-06-30-type-safety"
These standalone checks require that {rlang} be an imported package.
Use usethis::use_package("rlang"). It is a very small package and has
no dependencies. Very little to lose by adding it.
Scalar checks
R doesn’t have the concept of a scalar. Though using a scalar is still very useful in R.
The standalone checks provide helpers for checking scalar values. There a quite a few but the ones I use most commonly are:
check_string()check_bool()check_number_whole()check_number_decimal()
Usage
Each of these functions provide the arguments:
allow_naallow_null
This is helpful because using NULL is often used as a default argument for optional arguments.
For example we can check that something is a string:
But when it is a character vector:
Error:
! `c("hello", "world")` must be a single string, not a character vector.
This gives us an informative error telling the user what type was found and expected.
In the case of NULLs we can provide the allow_null argument which
allows the test to pass.
Vector checks
In addition to scalar checks, there are many handy vectorized checks.
There are vector checks these are:
check_character()check_logical()check_data_frame()
Error:
! `1:2` must be a character vector, not an integer vector.
Error:
! `c("a", "b")` must be a logical vector, not a character vector.
Error:
! `list(a = 1, b = 2)` must be a data frame, not a list.