🎄dvent Day 1: Rust and R solutions

r rust aoc

December 01, 2024

TL;DR

I give you the solutions to Day 1 of Advent of Code in R, Rust, and Rust in R for part 2 using {extendr}.

I tend to always do just the first day of the advent of code. It is not my cup of tea. I don’t enjoy word problems, or sudoku, or the like. But I do like it when people learn. I find many people use this as a time to learn a new language.

This morning I did the Advent of Code Day 1 in both R and Rust. I’ll discuss my approaches to the challenges.

Important

If you care a lot about the Advent of Code and want to do it yourself, do not read any further. I am giving away the answers.

Part 1

The objective of part one is to calculate the distance between column 1 and column 2 in acending order. Note that the distance is in absolute values. This is not mentioned but I figured it out after my first submission was wrong.

The approach:

read input
sort each column independently
calculate the different from column 2 and column 1
calculate the absolute value
sum it all up

R

This was a one liner:

sum(abs(do.call(`-`, lapply(read.table("day1.txt"), sort))))

[1] 2057374

Let’s try rewriting it using a pipe so it can be a bit easier to process:

read.table("day1.txt") |> 
  lapply(sort) |> 
  do.call(`-`, args = _) |> 
  abs() |> 
  sum()

[1] 2057374

There are two things here that may be novel to you. The first is that we can use lapply() with a data.frame.

To quote myself:

“Data frames are actually just lists masquerading as rectangles.”

Source: Finding and SPSS {haven}

This returns a list where each element is the sorted input vector.

Next, we can compute the different between the two columns by using do.call() with the function being -. do.call() takes a list of arguments and splices them into the function call.

Since our funciton, -, has two arguments it works perfectly. Then we wrap the results in sum(abs()) and voila.

Rust

The hardest part of the rust solution is reading the file to be completely honest. I’m still terrible with using readers in Rust so I used ChatGPTs help. I’m not going to lie about it.

Reading the input

The first thing to note is that we are returning Result<(Vec<i32>, Vec<i32>), Box<dyn Error>> from the function. We return a Result<> because there are multiple places where the function can error. Using a Result<> gives us the ability to unwrap anything inside of the body of the function that is in a Result<> itself. If there is an error, it will be returned—thus, “gracefully” handling the errors.

Typically, if you’re a Rust hardo, you will define your own custom Error type. That is too much work for me—and I’m not good at knowing all of the types of errors that I may want. Instead we use Box<dyn Error>. Box<dyn Error> is a fancy way of saying we can accept anything that implements the Error trait.

Next it is important to use a BufReader which allows us to read the file line by line. Always use a BufReader when possible. It will make your code so much faster.

Next, we are going to instantiate two vectors that we will use to store the results. Then we iterate through the lines of the reader and parse the contents and shove them into the vector. Voila.

use std::error::Error;
use std::fs::File;
use std::io::{BufRead, BufReader};

fn read_day1(path: &str) -> Result<(Vec<i32>, Vec<i32>), Box<dyn Error>> {
    let file = File::open(path)?;
    let reader = BufReader::new(file);

    // Create two vectors to hold the integers
    let mut vec1 = Vec::new();
    let mut vec2 = Vec::new();

    // Read the file line by line
    for line in reader.lines() {
        let line = line?;
        let mut nums = line
            .split_whitespace() // Split by whitespace
            .map(|s| s.parse::<i32>()); // Parse each number into i32

        // Collect the numbers into the two vectors
        if let (Some(Ok(num1)), Some(Ok(num2))) = (nums.next(), nums.next()) {
            vec1.push(num1);
            vec2.push(num2);
        } else {
            eprintln!("Skipping malformed line: {}", line);
        }
    }

    Ok((vec1, vec2))
}

Sorting and summing

Next, we define a little handy wrapper function. We can use destructuring assignment here to put the results of read_day1() into two items at once. If you’re an R user, this is like using {dotty} or {zeallot}. My preference is for dotty, personally.

pub fn day_one_part_one(path: &str) -> Result<i32, Box<dyn Error>> {
    // read the input. needs to be mutable to sort
    let (mut x, mut y) = read_day1(path)?;

    // sort the input
    x.sort();
    y.sort();

    // calculate the sum
    let res = x.iter().zip(y.iter()).fold(0, |mut acc, (xx, yy)| {
        acc += (yy - xx).abs();
        acc
    });

    Ok(res)
}

We iterate through x and why by creating a zipped iterator. When you zip an iterator you get a tuple of elements. We will iterate through these two items together and calculate the absolute difference and accumulate it along the way.

We accumulate the results using .fold() which takes two arguments :

The initial value to accumulate
A closure that has two arguments:
The accumulating value
The current value of the iterator

A closure is like an anonymous function in R that is defined like \(.x, .y) or using the purrr tilde syntax like ~ .x + .y.

It is also important that the closure must return the same type as the initial value.

In our closure we say that the acc (you can choose any name you’d like here, it is just a function argument) must be mutable so we can change its value at each step. We use the shortcut += operator so that we dont have to write acc = acc + (yy - xx).abs().

All together

Since these are just functions, we need to wrap them all up in our main.rs file.

main.rs

use std::error::Error;

fn main() -> Result<(), Box<dyn Error>> {
    let d1p1 = day_one_part_one("input/day1.txt")?;
    println!("Part 1: {d1p1}");
    Ok(())
}

Part 2

Part two was quite fun to do, actually. For it, we want to count the number of times that each value in the second column occurs. Each of these values correspond to a value in the first column. Our sum is now the value in column 1 multiplied by the number of times it occurs in column two. To approach this we will do the following:

read the input
count the number of times each value in column 2 occurs
calculate the “score” for each value in column 1
sum up the scores

R

The R solution is quite straight forward as well but again, might use techniques you’re not familiar with. Here is the solution in all of its (surprisingly fast) glory.

You may think it is ugly but I assure you, it is very fast.

x <- read.table("day1.txt")
sum(x$V1 * table(x$V2)[as.character(x$V1)], na.rm = TRUE)

[1] 23177084

Let’s break it up. The most important part is the table() call. This calculates how many times each value in x$V2 occurs. We can use this table as a lookup vector.

lookup <- table(x$V2)

Using a lookup vector is a very efficient approach that people tend to not think about. Since this is a named vector, we can extract it’s elements by name.

lookup["92252"]

92252 
   19

Now, all we need to do is do this for every value in x$V1. We have to cast x$V1 as a character vector otherwise it will attempt to do the lookup by position.

x_counts <- lookup[as.character(x$V1)]
tail(x_counts)

 <NA>  <NA>  <NA>  <NA>  <NA> 61539 
                                 11

If there is not any occurrences in x$V2 the value is NA which is very handy because an NA just like a 0 will propagate in multiplication. All we need to do now is multiple and sum!

sum(x$V1 * x_counts, na.rm = TRUE)

[1] 23177084

Rust

I quite enjoyed writing this rust solution—frankly more than either R or Rust solution. Any time I get to use a BtreeMap I’m giddy.

Counting unique values in Rust is a little bit different. We typically use a Map of some variety. Think of these as named lists. Typically you will hear reference about a HashMap. HashMap are key-value stores that do not have any sense of order in the key. BTreeMap is different because the key must be ordered. Since we will be performing a lookup based on an integer value, I feel BTreeMap may be better here—though only bench marks can prove it one way or another.

Here is the solution:

pub fn day_one_part_two(path: &str) -> Result<i32, Box<dyn Error>> {
    // read the inputs
    let (x, y) = read_day1(path)?;

    // Create an empty BTreeMap to count the occurrences
    let mut counts = BTreeMap::new();

    // Iterate through Y to populate the BTreeMap and increment
    // each time we see an entry
    for yi in y {
        let entry = counts.entry(yi).or_insert(0);
        *entry += 1;
    }

    // Iterate through x and get the value from the btreemap.
    // if it doesn't exist we get use the default value of 0
    // that is what the unwrap_or() is for
    let res = x.iter().fold(0, |mut acc, next| {
        let multiplier = counts.get(next).unwrap_or(&0);
        acc += next * multiplier;
        acc
    });

    Ok(res)
}

We instantiate an empty BTreeMap then we populate it. We do this using the below code. This will grab the entry with the key yi from the map. If it doesn’t exist, it will insert the value 0. Then we add the value 1 to it. Notice that *entry. We do this because we are assinging to a mutable reference. This lets the value inside of the counts BTreeMap be updated.

  for yi in y {
      let entry = counts.entry(yi).or_insert(0);
      *entry += 1;
  }

The next part is quite like our part 1 solution. We use .fold() to perform the sum for us. We iterate through each value of x—stored in the value of next in the closure. We then try and get the lookup value from our counts map. If there is no associated value, we provide a value of 0 and store it in our multiplier variable. Then we multiply xi (or next in the closure) and add it the the accumulator!

let res = x.iter().fold(0, |mut acc, next| {
    let multiplier = counts.get(next).unwrap_or(&0);
    acc += next * multiplier;
    acc
});

That’s it. While it is much more code, it feels much easier to read and a bit cleaner than the R solution.

Bonus: R + Rust via `{rextendr}`

We can take the part 2 solution and tidy it up into a Rust function that can be called from R using rextendr.

Note

This isn’t optimized to be fast code and we’re not even using R native types so we will incur an overhead cost to go from integer() vector to Vec<i32>.

To do this you will need {rextendr} installed. Do so with pak::pak("extendr/rextendr").

tally_code <- r"-{
  fn tally_day1(x: Vec<i32>, y: Vec<i32>) -> Result<i32> {
    let mut counts = std::collections::BTreeMap::new();
    for yi in y {
        let entry = counts.entry(yi).or_insert(0);
        *entry += 1;
    }

    let res = x.iter().fold(0, |mut acc, next| {
        let multiplier = counts.get(next).unwrap_or(&0);
        acc += next * multiplier;
        acc
    });

    Ok(res)
}

}-"

rextendr::rust_function(
  tally_code, 
  profile = "release",
  quiet = TRUE
)

Now we can call this code directly from R:

tally_day1(x$V1, x$V2)

[1] 23177084

Let’s perform a small bench mark between this and the R solution:

bench::mark(
  r = sum(x$V1 * table(x$V2)[as.character(x$V1)], na.rm = TRUE),
  rust_simple = tally_day1(x$V1, x$V2)
)

# A tibble: 2 × 6
  expression       min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr>  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 r            353.1µs  377.9µs     2584.  210.19KB     8.31
2 rust_simple   44.4µs   56.1µs    16948.    4.84KB     2.02

Rust solution code

Below is all of the code I used for the rust solution.

`main.rs`

main.rs

use std::error::Error;

mod day1;
use day1::*;
fn main() -> Result<(), Box<dyn Error>> {
    let d1p1 = day_one_part_one("input/day1.txt")?;
    let d1p2 = day_one_part_two("input/day1.txt")?;

    println!("Day 1 results:\n  Part 1: {d1p1}\n  Part 2: {d1p2}");
    Ok(())
}

`day1.rs`

day1.rs

use std::collections::BTreeMap;
use std::error::Error;
use std::fs::File;
use std::io::{BufRead, BufReader};

fn read_day1(path: &str) -> Result<(Vec<i32>, Vec<i32>), Box<dyn Error>> {
    let file = File::open(path)?;
    let reader = BufReader::new(file);

    // Create two vectors to hold the integers
    let mut vec1 = Vec::new();
    let mut vec2 = Vec::new();

    // Read the file line by line
    for line in reader.lines() {
        let line = line?;
        let mut nums = line
            .split_whitespace() // Split by whitespace
            .map(|s| s.parse::<i32>()); // Parse each number into i32

        // Collect the numbers into the two vectors
        if let (Some(Ok(num1)), Some(Ok(num2))) = (nums.next(), nums.next()) {
            vec1.push(num1);
            vec2.push(num2);
        } else {
            eprintln!("Skipping malformed line: {}", line);
        }
    }

    Ok((vec1, vec2))
}

pub fn day_one_part_one(path: &str) -> Result<i32, Box<dyn Error>> {
    // read the input. needs to be mutable to sort
    let (mut x, mut y) = read_day1(path)?;

    // sort the input
    x.sort();
    y.sort();

    // calculate the sum
    let res = x.iter().zip(y.iter()).fold(0, |mut acc, (xx, yy)| {
        acc += (yy - xx).abs();
        acc
    });

    Ok(res)
}

pub fn day_one_part_two(path: &str) -> Result<i32, Box<dyn Error>> {
    // read the inputs
    let (x, y) = read_day1(path)?;

    // Create an empty BTreeMap to count the occurrences
    let mut counts = BTreeMap::new();

    // Iterate through Y to populate the BTreeMap and increment
    // each time we see an entry
    for yi in y {
        let entry = counts.entry(yi).or_insert(0);
        *entry += 1;
    }

    // Iterate through x and get the value from the btreemap.
    // if it doesn't exist we get use the default value of 0
    // that is what the unwrap_or() is for
    let res = x.iter().fold(0, |mut acc, next| {
        let multiplier = counts.get(next).unwrap_or(&0);
        acc += next * multiplier;
        acc
    });

    Ok(res)
}

🎄dvent Day 1: Rust and R solutions

TL;DR

Part 1

R

Rust

Reading the input

Sorting and summing

All together

Part 2

R

Rust

Bonus: R + Rust via {rextendr}

Rust solution code

main.rs

day1.rs

Bonus: R + Rust via `{rextendr}`

`main.rs`

`day1.rs`