🎄dvent Day 1: Rust and R solutions
TL;DR
I give you the solutions to Day 1 of Advent of Code in R, Rust, and Rust
in R for part 2 using
{extendr}.
I tend to always do just the first day of the advent of code. It is not my cup of tea. I don’t enjoy word problems, or sudoku, or the like. But I do like it when people learn. I find many people use this as a time to learn a new language.
This morning I did the Advent of Code Day 1 in both R and Rust. I’ll discuss my approaches to the challenges.
Important
If you care a lot about the Advent of Code and want to do it yourself, do not read any further. I am giving away the answers.
Part 1
The objective of part one is to calculate the distance between column 1 and column 2 in acending order. Note that the distance is in absolute values. This is not mentioned but I figured it out after my first submission was wrong.
The approach:
- read input
- sort each column independently
- calculate the different from column 2 and column 1
- calculate the absolute value
- sum it all up
R
This was a one liner:
[1] 2057374
Let’s try rewriting it using a pipe so it can be a bit easier to process:
|>
|>
|>
|>
[1] 2057374
There are two things here that may be novel to you. The first is that we
can use lapply() with a data.frame.
To quote myself:
“Data frames are actually just lists masquerading as rectangles.”
Source: Finding and SPSS {haven}
This returns a list where each element is the sorted input vector.
Next, we can compute the different between the two columns by using
do.call() with the function being -. do.call() takes a list of
arguments and splices them into the function call.
Since our funciton, -, has two arguments it works perfectly. Then we
wrap the results in sum(abs()) and voila.
Rust
The hardest part of the rust solution is reading the file to be completely honest. I’m still terrible with using readers in Rust so I used ChatGPTs help. I’m not going to lie about it.
Reading the input
The first thing to note is that we are returning
Result<(Vec<i32>, Vec<i32>), Box<dyn Error>> from the function. We
return a Result<> because there are multiple places where the function
can error. Using a Result<> gives us the ability to unwrap anything
inside of the body of the function that is in a Result<> itself. If
there is an error, it will be returned—thus, “gracefully” handling the
errors.
Typically, if you’re a Rust hardo, you will define your own custom
Error type. That is too much work for me—and I’m not good at knowing
all of the types of errors that I may want. Instead we use
Box<dyn Error>. Box<dyn Error> is a fancy way of saying we can
accept anything that implements the Error trait.
Next it is important to use a BufReader which allows us to read the
file line by line. Always use a BufReader when possible. It will
make your code so much faster.
Next, we are going to instantiate two vectors that we will use to store the results. Then we iterate through the lines of the reader and parse the contents and shove them into the vector. Voila.
use Error;
use File;
use ;
Sorting and summing
Next, we define a little handy wrapper function. We can use
destructuring assignment here to put the results of read_day1() into
two items at once. If you’re an R user, this is like using
{dotty} or
{zeallot}. My preference is for
dotty, personally.
We iterate through x and why by creating a zipped iterator. When
you zip an iterator you get a tuple of elements. We will iterate through
these two items together and calculate the absolute difference and
accumulate it along the way.
We accumulate the results using .fold() which takes two arguments :
- The initial value to accumulate
- A closure that has two arguments:
- The accumulating value
- The current value of the iterator
A closure is like an anonymous function in R that is defined like
\(.x, .y) or using the purrr tilde syntax like ~ .x + .y.
It is also important that the closure must return the same type as the initial value.
In our closure we say that the acc (you can choose any name you’d like
here, it is just a function argument) must be mutable so we can change
its value at each step. We use the shortcut += operator so that we
dont have to write acc = acc + (yy - xx).abs().
All together
Since these are just functions, we need to wrap them all up in our
main.rs file.
main.rs
use Error;
Part 2
Part two was quite fun to do, actually. For it, we want to count the number of times that each value in the second column occurs. Each of these values correspond to a value in the first column. Our sum is now the value in column 1 multiplied by the number of times it occurs in column two. To approach this we will do the following:
- read the input
- count the number of times each value in column 2 occurs
- calculate the “score” for each value in column 1
- sum up the scores
R
The R solution is quite straight forward as well but again, might use techniques you’re not familiar with. Here is the solution in all of its (surprisingly fast) glory.
You may think it is ugly but I assure you, it is very fast.
x <-
[1] 23177084
Let’s break it up. The most important part is the table() call. This
calculates how many times each value in x$V2 occurs. We can use this
table as a lookup vector.
lookup <-
Using a lookup vector is a very efficient approach that people tend to not think about. Since this is a named vector, we can extract it’s elements by name.
lookup
92252
19
Now, all we need to do is do this for every value in x$V1. We have
to cast x$V1 as a character vector otherwise it will attempt to do the
lookup by position.
x_counts <- lookup
<NA> <NA> <NA> <NA> <NA> 61539
11
If there is not any occurrences in x$V2 the value is NA which is
very handy because an NA just like a 0 will propagate in
multiplication. All we need to do now is multiple and sum!
[1] 23177084
Rust
I quite enjoyed writing this rust solution—frankly more than either R or
Rust solution. Any time I get to use a BtreeMap I’m giddy.
Counting unique values in Rust is a little bit different. We typically
use a Map of some variety. Think of these as named lists. Typically
you will hear reference about a HashMap. HashMap are key-value
stores that do not have any sense of order in the key. BTreeMap is
different because the key must be ordered. Since we will be performing
a lookup based on an integer value, I feel BTreeMap may be better
here—though only bench marks can prove it one way or another.
Here is the solution:
We instantiate an empty BTreeMap then we populate it. We do this using
the below code. This will grab the entry with the key yi from the map.
If it doesn’t exist, it will insert the value 0. Then we add the value
1 to it. Notice that *entry. We do this because we are assinging to
a mutable reference. This lets the value inside of the counts
BTreeMap be updated.
for yi in y
The next part is quite like our part 1 solution. We use .fold() to
perform the sum for us. We iterate through each value of x—stored in
the value of next in the closure. We then try and get the lookup value
from our counts map. If there is no associated value, we provide a
value of 0 and store it in our multiplier variable. Then we multiply
xi (or next in the closure) and add it the the accumulator!
let res = x.iter.fold;
That’s it. While it is much more code, it feels much easier to read and a bit cleaner than the R solution.
Bonus: R + Rust via {rextendr}
We can take the part 2 solution and tidy it up into a Rust function that can be called from R using rextendr.
Note
This isn’t optimized to be fast code and we’re not even using R native types so we will incur an overhead cost to go from
integer()vector toVec<i32>.
To do this you will need {rextendr} installed. Do so with
pak::pak("extendr/rextendr").
tally_code <- r"-{
fn tally_day1(x: Vec<i32>, y: Vec<i32>) -> Result<i32> {
let mut counts = std::collections::BTreeMap::new();
for yi in y {
let entry = counts.entry(yi).or_insert(0);
*entry += 1;
}
let res = x.iter().fold(0, |mut acc, next| {
let multiplier = counts.get(next).unwrap_or(&0);
acc += next * multiplier;
acc
});
Ok(res)
}
}-"
rextendr::
Now we can call this code directly from R:
[1] 23177084
Let’s perform a small bench mark between this and the R solution:
bench::
# A tibble: 2 Ă— 6
expression min median `itr/sec` mem_alloc `gc/sec`
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
1 r 353.1µs 377.9µs 2584. 210.19KB 8.31
2 rust_simple 44.4µs 56.1µs 16948. 4.84KB 2.02
Rust solution code
Below is all of the code I used for the rust solution.
main.rs
main.rs
use Error;
use *;
day1.rs
day1.rs
use BTreeMap;
use Error;
use File;
use ;