Enums in R: towards type safe R
Hadley Wickham has recently dropped a new draft section of his book Tidy Design Principles on enumerations and their use in R.
In short, enumerations enumerate (list out) the possible values that something might take on. In R we see this most often in function signatures where an argument takes a scalar value but all possible values are listed out.
I will refer to enumerations as enums from here on.
Enums in R
A good example is the cor() function from the base package stats.
function (x, y = NULL, use = "everything", method = c("pearson",
"kendall", "spearman"))
NULL
The possible values for method are "pearson", "kendall", or
"spearman" but all values are listed inside of the function
definition.
Inside of the function, though, match.arg(method) is used to ensure
that the provided value to the method argument is one of the provided
values.
Hadley makes the argument that we should prefer an enumeration to a
boolean flag such as TRUE or FALSE. I agree!
A real world example
A post on mastodon makes a point that the function sf::st_make_grid()
has an argument square = TRUE where when set to FALSE hexagons are
returned.
In this case, it’s very clear that an enum would be better! For example we can improve the signature like so:
{
# ensure only one of the provided grid shapes are used
# ... rest of function
}
Enums in Rust
When I first started using rust enums made no sense to me. In Rust, enums are a first class citizen that are treated as their own thing.
I’m not really sure what to call things in Rust. Are they all objects?
We make them by defining the name of the enum and the variants they may take on.
Now you can use this enum GridShape to specify one of two types:
Square or Hexagon. Syntactically, this is written
GridShape::Square and GridShape::Hexagon.
Enums are very nice because we can match on the variants and do different things based on them. For example we can have a function like so:
It takes an argument x which is a GridShape enum. We
match on the
possible variants and then do something.
Inside of the match statement each of the possible variants of the enum
have to be written out. These are called match arms. The left side
lists the variant where as the right portion (after =>) indicates what
will be executed if the left side is matched (essentially if the
condition is true).
With this function we can pass in specific variants and get different behavior.
GridShape::Hexagon
which_shape
#> Hexagons are the bestagons
GridShape::Square
which_shape
#> We have a square!
Making an S7 enum object in R
I think R would benefit from having a “real” enum type object. Having
a character vector of valid variants and checking against them using
match.arg() or rlang::arg_match() is great but I think we can go
further.
Since learning Rust, I think having more strictness can make our code much better and more robust. I think adding enums would be a good step towards that
I’ve prototyped an Enum type in R using the new S7 object system
that might point us towards what an enum object in the future might look
like for R users.
Design of an Enum
For an enum we need to know what the valid variants are and what the current value of the enum is. These would be the two properties.
An enum S7 object must also make sure that a value of an Enum is one of
the valid variants. Using the GridShape enum the valid variants would
be "Square" and "Hexagon". A GridShape enum could not take, for
example, "Circle" since it is not a listed variant.
Using an abstract class
To start, we will create an abstract S7 class called Enum.
"_an abstract class is a generic class (or type of object) used as a basis for creating specific objects that conform to its protocol, or the set of operations it supports” — Source
The Enum class will be used to create other Enum objects.
# create a new Enum abstract class
Enum <-
In this code chunk we specify that there are 2 properties: Value and
Variant each must be a character type. Value will be the value of
the enum. It would be the right hand side of GridShape::Square in
Rust’s enum, for example. Variants is a character vector of all of the
possible values it may be able to take on. The validator ensures that
Value must only have 1 value. It also ensures that Value is one of
the enumerated Variants. This Enum class will be used to generate
other enums and cannot be instantiated by itself.
We can create a new enum factory function with the arguments:
enum_classthe class of the enum we are creatingvariantsa character vector of the valid variant values
# create a new enum constructor
{
}
Note that the constructor here only takes a Value argument. We do
this so that users cannot circumvent the pre-defined variants.
With this we can now create a GridShape enum in R!
GridShape <-
GridShape
<GridShape> class
@ parent : <Enum>
@ constructor: function(Value) {...}
@ validator : <NULL>
@ properties :
$ Value : <character>
$ Variants: <character>
This new object will construct new GridShape enums for us.
<GridShape>
@ Value : chr "Square"
@ Variants: chr [1:2] "Square" "Hexagon"
When we try to create a GridShape that is not one of the valid variants we will get an error.
Error: <GridShape> object is invalid:
- enum value must be one of possible variants
Making a print method
For fun, I would like Enum objects to print like how I would use them
in Rust. To do this we can create a custom print method
# print method for enums
# since this is an abstract class we get the first class (super)
# to print
{
}
Since Enums will only ever be a sub-class we can confidently grab the
first element of the class(enum_obj) which is the super-class of the
enum. We paste that together with the value of the enum.
square <-
square
GridShape::Square
Drawing even more from Rust
Rust enums are even more powerful than what I briefly introduced. Each variant of an enum can actually be typed!!! Take a look at the example from The Book™.
In this enum there are 4 variants. The first Quit doesn’t have any
associated data with it. But the other three do! The second one Move
has two fields x and y which contain integer values. Write is a
tuple with a string in it and ChangeColor has 3 integer values in its
tuple. These can be extracted.
A silly example function that illustrates how each value can be used can be
When a variant with data is passed in the values can be used. For example
which_msg;
#> Your RGB
Extending it to R
What would this look like if we extended it to an R based enum object? I
suspect the Variants would be a list of prototypes such as those from
{vctrs}. The Value would have to be
validated against all of the provided prototypes to ensure that it is
one of the provided types.
I’m not sure how I would code this up, but I think that would be a great thing to have.