Binary Trees
A great way to get more familiar with recursion is to work with binary trees. What are those? A binary tree is either empty or it is a node with a value and two subtrees. Those subtrees are either empty or a node with two subtrees. Etc. Here is a visual representation:
Notice that every node has two subtrees. That is what makes this a binary tree. We can represent this data structure in Elm with a union type:
type Tree a
= Empty
| Node a (Tree a) (Tree a)
So the tree in the image above would be represented like this:
exampleTree : Tree Int
exampleTree =
Node 4 (Node 1 Empty Empty) (Node 9 Empty Empty)
To make it easier to work with these trees, we need to write a bunch of recursive functions!
Member
Say we are curious if 9
is in exampleTree
? As humans it is easy to see that it is, but how can we devise a system to make it easy for computers too? Well, say we maintain some invariants about our trees:
- All values left of a
Node
must hold lower values. - All values right of a
Node
must hold higher values.
If we always follow these rules, we always know whether to go right or left from any node! So if we were looking for 9
we know it must be to the right of 4
. There it is! If we are looking for 7
we know it must be to the right of 4
. From there we know it must be left of 9
. Dead end. It is not in the tree. How do we write this as code though?
Well, we can always start with a case
.
member : comparable -> Tree comparable -> Bool
member target tree =
case tree of
Empty ->
...
Node value left right ->
...
If the tree
is Empty
we know we have not found the target
value.
member : comparable -> Tree comparable -> Bool
member target tree =
case tree of
Empty ->
False
Node value left right ->
...
We also know that if target
is less than value
it will be in the left
subtree. Let’s try to check that.
member : comparable -> Tree comparable -> Bool
member target tree =
case tree of
Empty ->
False
Node value left right ->
if target < value then
...
else
...
At this point we need to search the left
subtree, so let’s pretend we are done with member
and use it.
member : comparable -> Tree comparable -> Bool
member target tree =
case tree of
Empty ->
False
Node value left right ->
if target < value then
member target left
else
...
We can do roughly the same thing if target
is greater than value
.
member : comparable -> Tree comparable -> Bool
member target tree =
case tree of
Empty ->
False
Node value left right ->
if target < value then
member target left
else if target > value then
member target right
else
...
If the target
is not less than value
and it is not greater than value
, it must be equal to value
. That means we found our target!
member : comparable -> Tree comparable -> Bool
member target tree =
case tree of
Empty ->
False
Node value left right ->
if target < value then
member target left
else if target > value then
member target right
else
True
There we go. A function that can find values in any tree.
Note: This means
member
is relatively cheap as you get more and more values. Say you have a binary tree that is ten levels deep. That means it holds about 1023 values. Checking for a value will be ten steps or fewer!
Insert
Now we know how to check if values are in the tree, but how do we add new values? Remember the invariants we described for member
? Lower values go left, higher values go right. If we want to insert 7
to exampleTree
, we need to follow those rules:
Now we want to turn that into code. The logic seems pretty similar to member
so we will start with the same basic skeleton.
insert : comparable -> Tree comparable -> Tree comparable
insert target tree =
case tree of
Empty ->
...
Node value left right ->
if target < value then
...
else if target > value then
...
else
...
If the tree
is Empty
, we want to insert our target
value. The only thing we can really do here is say Node target Empty Empty
to switch from a tree with no values to a tree with one value.
insert : comparable -> Tree comparable -> Tree comparable
insert target tree =
case tree of
Empty ->
Node target Empty Empty
Node value left right ->
if target < value then
...
else if target > value then
...
else
...
Now if target
is less than value
, we know it needs to be inserted into the left
subtree. We can pretend insert
is done and use that.
insert : comparable -> Tree comparable -> Tree comparable
insert target tree =
case tree of
Empty ->
Node target Empty Empty
Node value left right ->
if target < value then
Node value (insert target left) right
else if target > value then
...
else
...
Notice that we do not just call insert
directly! We want to preserve the existing tree, just making a minor change to the left
subtree. We can do the same thing if target
is greater than value
.
insert : comparable -> Tree comparable -> Tree comparable
insert target tree =
case tree of
Empty ->
Node target Empty Empty
Node value left right ->
if target < value then
Node value (insert target left) right
else if target > value then
Node value left (insert target right)
else
...
Same idea but we insert
on the right
subtree. In the final case we know that target == value
which means this value is already in the tree. That means we have nothing to insert.
insert : comparable -> Tree comparable -> Tree comparable
insert target tree =
case tree of
Empty ->
Node target Empty Empty
Node value left right ->
if target < value then
Node value (insert target left) right
else if target > value then
Node value left (insert target right)
else
tree
Great, now our insert
function will work exactly like the picture above if we evaluate insert 7 exampleTree
. So if we wanted to make it easier to define binary trees, we could create a fromList
function that just inserts a bunch of values:
fromList : List comparable -> Tree comparable
fromList list =
List.foldl insert Empty list
-- exampleTree = fromList [4,1,9]
It looks like fromList
may be quite expensive because calling insert
creates a “totally new” binary tree. Thanks to immutability, it is actually pretty cheap!
Imagine we have a binary tree that is ten levels deep and it is totally full. It would hold about 1024 values. We are going to insert
a value higher than all the others, so it should be added as the rightmost node. At the root, we know we need to insert
on the right, so we can reuse the entire left
subtree containing about 500 values. We need to go right again, so we can again reuse that left
subtree which contains about 250 values. Etc.
In the end, we only need to replace the nodes that lead directly to the new entry in the tree. That means ten allocations, or about 1% of nodes. Sharing these massive subtrees between old and new trees does not cause any tricky bugs because all the values are immutable. That means someone from far away cannot sneakily mutate your tree!
Note: The
Dict
andSet
libraries are actually balanced binary trees meaning that we do a bit of extra work to guarantee that the trees never get too lopsided. This is how we provide dictionaries that are fast and immutable.
Depth and Map
Here are two more examples that may be helpful to look through. The depth
function figures out how many levels are in a tree:
depth : Tree a -> Int
depth tree =
case tree of
Empty ->
0
Node v left right ->
1 + max (depth left) (depth right)
And the map
function transforms every value in the tree:
map : (a -> b) -> Tree a -> Tree b
map func tree =
case tree of
Empty ->
Empty
Node v left right ->
Node (func v) (map func left) (map func right)
Both recursive functions were discovered using the “pretend you are done” strategy.
Exercises
Exercise 1: Get the sum of all the values in a tree.
sum : Tree Int -> Int sum tree = ...
Exercise 2: Flatten a tree into a list.
flatten : Tree a -> List a flatten tree = ...
Try flattening it in different ways. Can you make it so the values are in-order in the resulting list? Can you use the
foo
andfooHelp
technique to build up an in-order list without using(++)
? Exercise 3: Write a fold function for trees. The fold function does not need to guarantee a particular order of traversal.fold : (a -> b -> b) -> b -> Tree a -> b fold add answerSoFar tree = ...
Exercise 4: Use
fold
to definemember
,sum
, andflatten
.Exercise 5: Try implementing
fold
with a bunch of different traversal strategies. Good ones to try are pre-order, in-order, and post-order.