I took this (2022) Advent of Code as an opportunity to learn the Rust programming language. This post is about the puzzle on day 13 where I managed to come up with a pretty cleanly implementated solution thanks to some language features and a library crate.
The puzzle gives you a bunch of inputs that look like this:
[1,[2,[3,[4,[5,6,7]]]],8,9]
Lists of numbers and nested lists. You have to do some operations on them to come up with the answers; more on that later. For the full puzzle, see the puzzle on the AoC website.
The challenging part of this puzzle is the fact the lists are composed of numbers and of other lists. This can be annoying to parse and juggle as structured data in memory. Parsing the data with regular expressions will not work so you would need to build a custom tokenizer. The data itself can be modeled as a tree, but manipulating it or performing other operations on it requires a lot of boilerplating in a language like C.
Algebraic Data Types
Algebraic Data Types (or ADTs for short) are basically types composed of several different sub-types. In C, you can achieve this with a pattern such as:
enum SomethingType {
AnInteger,
List,
SomeCoolStruct,
};
struct Something {
enum SomethingType type;
union {
int32_t integer;
void *list;
struct Cool some_cool;
} sub;
};
Before every access to the subtype you have to check the type
field to see what kind of data you are actually accessing through
sub
. When these checks don’t work as they should or are forgotten,
you can end up with a type confusion vulnerability.
Rust implements ADTs as part of the enum type. The C code above implemented in Rust simply looks like this:
enum Something {
AnInteger(i32),
List(Vec<i32>),
SomeCoolStruct(Cool),
}
Pretty neat, right? This blog post on Algebraic Data Types was pretty useful to me. Let’s see how we can access the sub-types..
Subtype access through Patterns
Another cool Rust language feature is Patterns (or Pattern Matching). To use a Pattern to access data stored as the ADT we defined above, we can use the following:
match var {
AnInteger(i) => println!("This is an integer and it contains {}!", i),
List(_) => println!("This is a list!"),
SomeCoolStruct(_) => println!("This is one of those extremely cool structs!"),
};
Rust calls this ‘destructuring’ the data. There are also other ways to do this, such as the if let syntax.
The benefit of having these types as part of the language syntax is that the
compiler can actually check if you are accessing the ADTs properly,
at compile-time. For instance, if you add another value to the enum, the
compiler realises you are no longer covering all the values in your match
statement and gives an error. It is also syntactically impossible to access
the data using the wrong subtype. No more type confusion vulnerabilities!
Nom, a parser combinator library
Nom is a parser combinator library, written in Rust. A parser combinator library is basically a collection of tiny parsers that share a compatible interface to parse simple data types that can be combined to create complex and flexible parsers.
The parsers nom provides all use the IResult return type which itself is an ADT that can indicate failure or success, in which case it returns the parser output as well as any left-over input so it can be fed to another parser.
Solving the puzzle
Parsing and structuring the input
As I mentioned, the input is a nested list of integer values or other nested lists called packets, for example (from the AoC website):
[1,[2,[3,[4,[5,6,7]]]],8,9]
Using an ADT, we can simply model the data as a tree, where one node (one comma separated field in the packet) is as follows:
enum PacketField {
Num(u32),
Sub(Vec<PacketField>),
}
A PacketField
can either be an integer value (sub-type Num
) or a list of itself (Sub
).
Pretty neat! Now let’s look at how nom can help us parse the input into data of this type..
fn nomnom_packet(input: &str) -> IResult<&str, PacketField> {
alt((
map(u32, PacketField::Num),
map(
delimited(
char('['),
separated_list0(char(','), nomnom_packet),
char(']'),
),
PacketField::Sub,
),
))(input)
}
Let’s break that down a bit into what the individual nom functions do:
nom::branch::alt()
tries to apply a list of parsers in order and return once one of them succeeds.nom::combinator::map()
tries to apply a single parser and map the result to a function (in this case: construct a variable of aPacketField
sub-type).nom::character::complete::u32()
tries to parse a character string input to an unsigned 32-bit integer. In this case: the literal numbers in the packet.nom::sequence::delimited()
tries to parse the data delimited by a specific start and end sequence. In this case, it would match a nested packet delimiter by[
and]
. The delimiters and separators are discarded.nom::complete::char()
simply tries to match a single character literal (the start, end and seperation characters here).nom::multi::separated_list0()
matches a list, seperated by a given seperation character. Note the0
at the end of the function name - which means it will match “0 or more” items; these functions also have a1
variant to match “1 or more”. In this case, we call the root parser again for every list item. That’s right - we’ve built a recursive parser!
Running this parser on the input example above generates the following data structure:
Sub(
[
Num(
1,
),
Sub(
[
Num(
2,
),
Sub(
[
Num(
3,
),
Sub(
[
Num(
4,
),
Sub(
[
Num(
5,
),
Num(
6,
),
Num(
7,
),
],
),
],
),
],
),
],
),
Num(
8,
),
Num(
9,
),
],
),
Solving part 1
Part 1 asks us to compare a pair of packets and find the ones that are not
equal. We cannot, by default, do comparisons of this custom type but Rust
allows us to implement the Ord
trait to enable comparison of data stored as our PacketField
type.
The integer subtype can just use regular integer comparison and for a list (Sub
)
we can traverse into both lists.
We have to implement all possible combinations of sub-type comparisons.
The Vec
and i32
already implement the Ord
trait,
so all we need to do is map the different sub-type combinations to the
existing compare functions:
impl Ord for PacketField {
fn cmp(&self, other: &Self) -> Ordering {
match (self, other) {
(PacketField::Sub(s), PacketField::Sub(o)) => s.cmp(o),
(PacketField::Num(s), PacketField::Num(o)) => s.cmp(o),
(PacketField::Sub(s), PacketField::Num(o)) => s.cmp(&vec![PacketField::Num(*o)]),
(PacketField::Num(s), PacketField::Sub(o)) => vec![PacketField::Num(*s)].cmp(o),
}
}
}
For ease, if we encounter a comparison of a list with an integer, we
just wrap the integer in a Vec
to do a Vec
-to-Vec
comparison.
We also need to implement the PartialOrd
trait because the Ord
trait requires it, because a total order relation implies a partial order
relation. Since for our type all values have a total order
relation, we can simply call the total order comparison we defined:
impl PartialOrd for PacketField {
fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
Some(self.cmp(other))
}
}
We also need Eq
and PartialEq
of course, but we can simply
#[derive()]
those because the default implementations work for us.
This blog post explains these (and other) built-in traits pretty well.
Now, solving the puzzle is easy! Just compare the pairs of packets to each other and sum the (1-based) indices:
fn part1(packets: &[PacketField]) -> usize {
packets
.iter()
.tuples()
.enumerate()
.filter(|(_, (l, r))| l < r)
.fold(0, |acc, (i, _)| acc + (i + 1))
}
The Itertools::tuples
function groups the pairs of packets into tuples. The filter()
then
takes the pairs that are in the right order and the fold()
accumulates the 1-based indices (added through enumerate()
) to get
to an answer.
Solving part 2
In part 2 we need to insert two markers in between the packets, sort the entire list and multiply the indices of the markers in the sorted list to get the final answer, as a way to prove we managed to sort the list correctly.
Fortunately, the Ord
functionality we implemented for part 1 is also
what the sort functions use for ordering elements! So part 2 is really
simple at this point:
fn part2(origpackets: &[PacketField]) -> usize {
let mut packets = origpackets.to_vec();
let dividers = [
nomnom_packet("[[2]]").unwrap().1,
nomnom_packet("[[6]]").unwrap().1,
];
packets.extend_from_slice(÷rs);
packets.sort();
packets
.iter()
.enumerate()
.filter(|(_, p)| dividers.contains(p))
.fold(1, |acc, (i, _)| acc * (i + 1))
}
This function uses the nom-based parser we made to parse the two divider
packets and add them at the end of the list and then sorts the whole list.
Then, it uses enumerate()
to add indices, filter()
s out every
packet except the dividers and uses fold()
to multiply the (1-based)
indices of the dividers together.
And that is day 13 done! Full solution can be found on my GitHub.