Implementing a binary heap in Ocaml
I didnāt come from a ātraditionalā software engineering background. Iāve never received formal training in anything software related.
This becomes particularly apparent when it comes to algorithms and datastructures. A big scary, meaty topic. And Iām a vegetarian!
My typical flow for learning concepts in this crucial area of software engineering goes a little something like this:
- Conor reads about something heās never seen before
- Conor imagines how such a concept could be useful in his own projects
- Conor scratches his head at the black magic one would have to perform to achieve this concept in code
If Iām lucky at this point something will click and its just a case of using the newly discovered knowledge enough times that it doesnāt fade by breakfast the next day.
Algorithms and data structures have historically been a gap in my knowledge
Enter the binary heap
I recently came across the Binary Heap data structure when perusing the code for TinyVector. Its a Rust based project, and it was calling the standard library BinaryHeap
type to efficiently order a set of items by priority.
Seeing this heap in the wild got me to thinking, āwhat the devil is this thing and how might I use it myselfā.
The interweb has some pretty good resources on the what of binary heaps which really helped:
- Wikipedia entry
- Excellent Youtube video describing what a Binary heap is and how it functions
- A great Algoviz visualisation
Letās get implementing
I picked OCaml to implement my own binary heap, as its a language Iāve wanted to sink deeper into for a good while.
Its functional and strongly typed, but the typing feels very implicit and doesnāt require much manual defining.
Letās start with our basic binary_tree
type
type 'a binary_tree =
| Empty
| Node of 'a * 'a binary_tree * 'a binary_tree;;
What is this saying?
Well, a binary_tree
with generic 'a
type can be either
Empty
or a Node
.
Each Node
then contains 3 elements/arguments these are separated by *
characters in the OCaml syntax;
- An
'a
(thatās our generic āanythingā value which the tree wraps) - A left binary tree (of our generic type)
- A right binary tree
So its a type which calls itself recursively. Nice!
Inserting values into the tree
To add values into our heap, we need a way of prioritising them. Since in my head the value should be super generic - able to take any kind of data to be sorted into our heap, I thought a great structure for each item would be a tuple of (priority, value)
. That way priorities can be simple int
types which are easy to compare etc.
Lets write a single insert
function which takes a heap and adds a Node
containing the given value
.
import Note from ā../../components/markdown/Note.astroā
This means that instead of this:
let foo value =
match value with
| Bar -> "this is ok"
| Baz -> "have to repeat the value twice tho!"
you can write this
let foo = function
| Bar -> "wow right?"
| Baz -> "check that brevity!"
This style of function just feels so magical!
Our insert
is going to take a tuple of the priority
and the actual value
, and then immediately pattern match on the provided Heap, since the behaviour of insert is very much dependent on the current shape of the Heap.
Also it will need to be marked as recursive, so that it can call itself!
let rec insert (priority, value) = function
What shapes can the heap take?
Well it could either be Empty
or a Node
, so these are our pattern matched options
| Empty -> Node ((priority, value), Empty, Empty)
| Node ((p, v), left, right) ->
When Empty
, we just create the first Node
in the heap, with Empty
left and right leaf nodes.
If there is a Node
(that will be the head of the heap), then we destructure whats inside giving us access to the contained (priority, value)
tuple, as well as the child leaf nodes.
Its then a case of comparing the priorities of the top node with the arguments, if the new one is less than the current one (ie. we pass in a priority 1 and the current node is priority 2, 1 is more important) we swap them, and recursively call our insert function on one of the leaf nodes. Otherwise we do the opposite, recursively insert using the argument value.
if priority < p then
Node ((priority, value), insert (p, v) right, left)
else
Node ((p, v), insert (priority, value) right, left)
Pulling everything together, hereās what weāve got:
let rec insert (priority, value) = function
| Empty -> Node ((priority, value), Empty, Empty)
| Node ((p, v), left, right) ->
if priority < p then
Node ((priority, value), insert (p, v) right, left)
else
Node ((p, v), insert (priority, value) right, left);;
Peeking the top value
In a binary tree (or priority queue), we are interested only in the value of the top Node.
With a bit of pattern matching magic we can pluck out the value the Node contains
Remember that a
Node
contains 3 arguments
- The value the Node holds
- The left Node child
- The right Node child
let peek = function
| Empty -> None
| Node ((_, value), _, _) -> Some (value);;
Popping off the top
Queue the reshuffle!
To remove items from Binary Heaps you start at the first node, the top of the tree, and pop it off leaving a headless tree of nodes. We then pick the highest value child node and promote it to be the new head of the tree.
This keeps the top item as the highest priority item.
Hereās one way you might implement it in our OCaml module.
let rec pop = function
| Empty -> Empty
| Node (_, left, right) ->
match (left, right) with
| (Empty, _) -> right
| (_, Empty) -> left
| (Node ((pl, vl), _, _), Node((pr, vr), _, _)) ->
if pl > pr then
Node ((pl, vl), pop left, right)
else
Node ((pr, vr), left, pop right);;
Whoa right?
Just break it down line by line and its less scary, and all just about dipping in and destructuring the required values to determine which is the highest priority, and then popping again on the remaining leaf nodes.
Testing in the REPL
OCaml has the utop
tool for running code in a REPL style. You can import your own modules using the #use "myfile.ml"
expression.
Letās fire up the utop
REPL and create a binary heap.
let heap =
Empty
|> insert (2, "second one")
|> insert (1, "should be top")
|> insert (3, "lesser")
|> insert (8, "way down")
|> insert (2, "another 2?!") ;;
This will print out the Node tree looking similar to this
val heap : (int * string) binary_tree =
Node ((1, "should be top"),
Node ((2, "another 2?!"), Node ((3, "lesser"), Empty, Empty), Empty),
Node ((2, "second one"), Node ((8, "way down"), Empty, Empty), Empty))
Now we can test our peek
function
peek heap;;
(*
- : string option = Some "should be top"
*)
We can see that the highest priority is indeed on top!
Wrapping our datastructure in a module
In OCaml, you can be very specific about types and functions a module exports. These definitions are placed in a .mli
interface file (named the same as your module).
(* binary_heap.mli *)
type 'a binary_tree
val insert : 'a * 'b -> ('a * 'b) binary_tree -> ('a * 'b) binary_tree
val peek : ('a * 'b) binary_tree -> 'b option
val pop : ('a * 'b) binary_tree -> ('a * 'b) binary_tree
Wait, no specific type definition for the binary_tree
? Why donāt we use the type we specified earlier?
This is actually a really really cool OCaml feature. You can definite the types in a moduleās interface by name only. This means that from the outside, calling code can only interact with your module via the functions which you provide it, but further than that, you could change the entire implementation of the module, including the main type which it revolves around, without any change to the caller.
It would be non breaking for us to force the priority field to be a vector rather than a number.
How crazy is that?!