I took this (2022) Advent of Code as an opportunity to learn the Rust programming language. This post is about the puzzle on day 13 where I managed to come up with a pretty cleanly implementated solution thanks to some language features and a library crate.

The puzzle gives you a bunch of inputs that look like this:

[1,[2,[3,[4,[5,6,7]]]],8,9]

Lists of numbers and nested lists. You have to do some operations on them to come up with the answers; more on that later. For the full puzzle, see the puzzle on the AoC website.

The challenging part of this puzzle is the fact the lists are composed of numbers and of other lists. This can be annoying to parse and juggle as structured data in memory. Parsing the data with regular expressions will not work so you would need to build a custom tokenizer. The data itself can be modeled as a tree, but manipulating it or performing other operations on it requires a lot of boilerplating in a language like C.

Algebraic Data Types

Algebraic Data Types (or ADTs for short) are basically types composed of several different sub-types. In C, you can achieve this with a pattern such as:

enum SomethingType {
    AnInteger,
    List,
    SomeCoolStruct,
};

struct Something {
    enum SomethingType type;
    union {
        int32_t integer;
        void *list;
        struct Cool some_cool;
    } sub;
};

Before every access to the subtype you have to check the type field to see what kind of data you are actually accessing through sub. When these checks don’t work as they should or are forgotten, you can end up with a type confusion vulnerability.

Rust implements ADTs as part of the enum type. The C code above implemented in Rust simply looks like this:

enum Something {
    AnInteger(i32),
    List(Vec<i32>),
    SomeCoolStruct(Cool),
}

Pretty neat, right? This blog post on Algebraic Data Types was pretty useful to me. Let’s see how we can access the sub-types..

Subtype access through Patterns

Another cool Rust language feature is Patterns (or Pattern Matching). To use a Pattern to access data stored as the ADT we defined above, we can use the following:

match var {
    AnInteger(i) => println!("This is an integer and it contains {}!", i),
    List(_) => println!("This is a list!"),
    SomeCoolStruct(_) => println!("This is one of those extremely cool structs!"),
};

Rust calls this ‘destructuring’ the data. There are also other ways to do this, such as the if let syntax.

The benefit of having these types as part of the language syntax is that the compiler can actually check if you are accessing the ADTs properly, at compile-time. For instance, if you add another value to the enum, the compiler realises you are no longer covering all the values in your match statement and gives an error. It is also syntactically impossible to access the data using the wrong subtype. No more type confusion vulnerabilities!

Nom, a parser combinator library

Nom is a parser combinator library, written in Rust. A parser combinator library is basically a collection of tiny parsers that share a compatible interface to parse simple data types that can be combined to create complex and flexible parsers.

The parsers nom provides all use the IResult return type which itself is an ADT that can indicate failure or success, in which case it returns the parser output as well as any left-over input so it can be fed to another parser.

Solving the puzzle

Parsing and structuring the input

As I mentioned, the input is a nested list of integer values or other nested lists called packets, for example (from the AoC website):

[1,[2,[3,[4,[5,6,7]]]],8,9]

Using an ADT, we can simply model the data as a tree, where one node (one comma separated field in the packet) is as follows:

enum PacketField {
    Num(u32),
    Sub(Vec<PacketField>),
}

A PacketField can either be an integer value (sub-type Num) or a list of itself (Sub).

Pretty neat! Now let’s look at how nom can help us parse the input into data of this type..

fn nomnom_packet(input: &str) -> IResult<&str, PacketField> {
    alt((
        map(u32, PacketField::Num),
        map(
            delimited(
                char('['),
                separated_list0(char(','), nomnom_packet),
                char(']'),
            ),
            PacketField::Sub,
        ),
    ))(input)
}

Let’s break that down a bit into what the individual nom functions do:

  • nom::branch::alt() tries to apply a list of parsers in order and return once one of them succeeds.
  • nom::combinator::map() tries to apply a single parser and map the result to a function (in this case: construct a variable of a PacketField sub-type).
  • nom::character::complete::u32() tries to parse a character string input to an unsigned 32-bit integer. In this case: the literal numbers in the packet.
  • nom::sequence::delimited() tries to parse the data delimited by a specific start and end sequence. In this case, it would match a nested packet delimiter by [ and ]. The delimiters and separators are discarded.
  • nom::complete::char() simply tries to match a single character literal (the start, end and seperation characters here).
  • nom::multi::separated_list0() matches a list, seperated by a given seperation character. Note the 0 at the end of the function name - which means it will match “0 or more” items; these functions also have a 1 variant to match “1 or more”. In this case, we call the root parser again for every list item. That’s right - we’ve built a recursive parser!

Running this parser on the input example above generates the following data structure:

Sub(
    [
        Num(
            1,
        ),
        Sub(
            [
                Num(
                    2,
                ),
                Sub(
                    [
                        Num(
                            3,
                        ),
                        Sub(
                            [
                                Num(
                                    4,
                                ),
                                Sub(
                                    [
                                        Num(
                                            5,
                                        ),
                                        Num(
                                            6,
                                        ),
                                        Num(
                                            7,
                                        ),
                                    ],
                                ),
                            ],
                        ),
                    ],
                ),
            ],
        ),
        Num(
            8,
        ),
        Num(
            9,
        ),
    ],
),

Solving part 1

Part 1 asks us to compare a pair of packets and find the ones that are not equal. We cannot, by default, do comparisons of this custom type but Rust allows us to implement the Ord trait to enable comparison of data stored as our PacketField type. The integer subtype can just use regular integer comparison and for a list (Sub) we can traverse into both lists.

We have to implement all possible combinations of sub-type comparisons. The Vec and i32 already implement the Ord trait, so all we need to do is map the different sub-type combinations to the existing compare functions:

impl Ord for PacketField {
    fn cmp(&self, other: &Self) -> Ordering {
        match (self, other) {
            (PacketField::Sub(s), PacketField::Sub(o)) => s.cmp(o),
            (PacketField::Num(s), PacketField::Num(o)) => s.cmp(o),
            (PacketField::Sub(s), PacketField::Num(o)) => s.cmp(&vec![PacketField::Num(*o)]),
            (PacketField::Num(s), PacketField::Sub(o)) => vec![PacketField::Num(*s)].cmp(o),
        }
    }
}

For ease, if we encounter a comparison of a list with an integer, we just wrap the integer in a Vec to do a Vec-to-Vec comparison.

We also need to implement the PartialOrd trait because the Ord trait requires it, because a total order relation implies a partial order relation. Since for our type all values have a total order relation, we can simply call the total order comparison we defined:

impl PartialOrd for PacketField {
    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
        Some(self.cmp(other))
    }
}

We also need Eq and PartialEq of course, but we can simply #[derive()] those because the default implementations work for us.

This blog post explains these (and other) built-in traits pretty well.

Now, solving the puzzle is easy! Just compare the pairs of packets to each other and sum the (1-based) indices:

fn part1(packets: &[PacketField]) -> usize {
    packets
        .iter()
        .tuples()
        .enumerate()
        .filter(|(_, (l, r))| l < r)
        .fold(0, |acc, (i, _)| acc + (i + 1))
}

The Itertools::tuples function groups the pairs of packets into tuples. The filter() then takes the pairs that are in the right order and the fold() accumulates the 1-based indices (added through enumerate()) to get to an answer.

Solving part 2

In part 2 we need to insert two markers in between the packets, sort the entire list and multiply the indices of the markers in the sorted list to get the final answer, as a way to prove we managed to sort the list correctly.

Fortunately, the Ord functionality we implemented for part 1 is also what the sort functions use for ordering elements! So part 2 is really simple at this point:

fn part2(origpackets: &[PacketField]) -> usize {
    let mut packets = origpackets.to_vec();
    let dividers = [
        nomnom_packet("[[2]]").unwrap().1,
        nomnom_packet("[[6]]").unwrap().1,
    ];
    packets.extend_from_slice(&dividers);
    packets.sort();

    packets
        .iter()
        .enumerate()
        .filter(|(_, p)| dividers.contains(p))
        .fold(1, |acc, (i, _)| acc * (i + 1))
}

This function uses the nom-based parser we made to parse the two divider packets and add them at the end of the list and then sorts the whole list. Then, it uses enumerate() to add indices, filter()s out every packet except the dividers and uses fold() to multiply the (1-based) indices of the dividers together.

And that is day 13 done! Full solution can be found on my GitHub.