---
title: Functional Data Structures
subtitle: Analysis and Implementation
author: Levi Pearson
...

# Intro

### Outline

1. Basics of Functional Data Structures
    * Immutability
    * Persistent vs. Ephemeral
    * Node Structure
    * Data Sharing

2. Analysis Techniques
    * Amortization Overview
    * Amortization with Persistence

3. Design Techniques
    * Lazy Rebuilding and Scheduling
    * Numerical Representations
    * Non-Uniform Structure

4. Real-World Examples
    * Zippers
    * 2-3 Finger Trees
    * Hash Array Mapped Tries

### Motivation

Functional data structures provide:

* Relatively simple implementations
* Good worst-case performance bounds
* Persistence for free
* Ease of use in concurrent programs

## Basics

### Functional Data Structures?

What do we mean by "Functional Data Structure"?

* Built from immutable values
* Can be implemented in a purely functional language
* No in-place updates!

Obviously not the data structures taught in Intro to Algorithms class...

### Persistence

Most imperative structures are *ephemeral*; updates destroy previous versions

A data structure is *persistent* when old versions remain after updates

* Developed in the context of imperative structures
* Imperative versions of persistent structures can be quite complex
* Adding persistence can change performance bounds
* Purely functional structures are automatically persistent!

### Value of Persistence

Why would we want persistence anyway?

* Safe data sharing
* No locking required
* Restoration of previous states
* Interesting algorithms (e.g. in computational geometry)

### Node-Based Structure

Functional data structures share structural characteristics:

* Built from *nodes* and *references*
* Nodes contain references:
    * one or more references to other nodes
    * one or more references to values
* A structure may be built from one or more *kinds* of node

Examples:

~~~ {.haskell}
data List a = Nil
            | Cons a (List a)

data Tree a = Leaf a
            | Branch (Tree a) (Tree a)
~~~

### Data Sharing

The fine-grained node structure has some benefits:

* New versions of structures share the bulk of their data with old versions
* Copying on updates is limited to *nodes* rather than *values*
* Only the nodes on the path from the root to change point are copied.

Also some drawbacks:

* Extra overhead for references vs. in-place storage
* Locality of data can suffer

# Analysis

## Amortization

### Structures Enable Algorithms

* Data structures support *operations* on them
* Operations are implemented via *algorithms*
* Algorithms perform operations

We bottom out at two primitive operations:

* Construction of an ADT value
* Destructuring/Pattern Matching of an ADT value

### Algorithm comparison

We judge between algorithms based on analysis:

* How many primitive operations are needed to complete the algorithm?
* How does the number of operations scale as the problem size grows?

*Asymptotic analysis* gives an upper bound to the growth of operation
 count as the size grows to infinity.

### Worst Case

Typical analysis is *worst-case*:

* Find a function such that some constant times that function of the
  input size is *always* greater than the actual count of operations
  for any single operation.
* That function is a worst-case upper bound.
* The smallest upper bound is how we characterize the asymptotic
  worst-case behavior.

### Average Case

Sometimes we consider *average-case*:

* Find the upper bounds of all possible single operation cases.
* Model the statistical likelihood of the cases.
* Find the smallest upper bound of the weighted combination of cases.

### Amortized

An *amortized* analysis is different:

* Find the upper bounds of all possible single operation cases.
* Consider the possible *sequences* of operations.
* Show that *expensive* operations always run infrequently enough to
  distribute their cost among *cheap* operations

### When To Amortize?

A **worst-case** analysis is necessary when you have hard latency bounds,
e.g. real-time requirements.

**Average** or **amortized** analysis can give a better sense of
*expected throughput* of an algorithm.

**Amortized** analysis is useful when most operations are cheap, but
occasionally an expensive one must be performed.

Examples:

* Array doubling
* Rehashing
* Tree balancing

Developed in the context of *ephemeral* data structures.

### The Persistence Wrench

An ephemeral structure has a single logical future: Once an operation
is performed, there's no going back.

The amortization accounting process is straightforward:

* Assign an extra *credit* cost to cheap cases.
* Prove that an expensive case only occurs after sufficient credit
  has accumulated to pay for it.

Persistent structures may have *multiple* logical futures.

Consider a tree very close to needing an expensive rebalancing:

~~~ {.Haskell}
-- tr1 refers to the teetering tree
let tr2 = insert foo tr1 -- causes a rebalance: expensive
let tr3 = insert foo tr1 -- causes another rebalance
let tr4 = insert foo tr1 -- causes yet another rebalance
~~~

How to account for that? You can't spend your savings more than once.

## Functional Amortization

### Tackling Persistence with Laziness

Chris Okasaki discovered a way around the multiple-future problem:

* Delay expensive operations via lazy evaluation
* Account for expensive operations via *debt*
* Savings can only be spent once, but over-paying on debt is ok

This works because:

* A suspended computation places a *thunk* in the new structure
* When a thunk is *forced*, it performs its calculation
* The value replaces the thunk *in place*

The previous example, with lazy evaluation:

~~~ {.Haskell}
-- tr1 refers to the teetering tree
let tr2 = insert foo tr1 -- creates a rebalancing thunk: cheap
let tr3 = insert bar tr2 -- forces rebalancing: expensive
let tr4 = insert baz tr2 -- sees previous rebalancing
~~~

### Outline of Amortized Lazy Analysis

Like a layaway plan:

1. Find a computation you can't afford to execute
2. Create a suspension for it, assign a value proportional to its shared cost
3. Pay the debt off a little at a time
4. When the debt is paid off, the suspension may be executed

Consider each logical future as if it were the *only one*.

Each future must fully pay before invoking any suspension.

### Restoring Worst-Case Bounds

Sometimes the latency budget requires a tight worst-case bound on all
operations.

Worst-case bounds can be restored via *scheduling*

* Execute suspended computations in bounded steps
* Execute the steps as they are paid off

This has a small space and complexity overhead, but bounds the latency.

### Using Analysis Results

Sure that sounds like *fun*, but what's it good for?

Different operations over structures (e.g. head-insert, tail-insert,
search) are necessary to implement higher-level algorithms.

Choose the data structure that:

* Provides good bounds for the operations your algorithm needs
* Provides the latency guarantees you need

Other factors as well, of course

# Implementation Patterns

## Lazy Rebuilding

### Queues, Take 1

The essence of the Queue:

* `insert` adds an element at the end of the queue
* `remove` takes an element from the front of the queue

How to represent it? What about a List?

* Our `remove` operation is $O \left({1}\right)$
* But `insert` is $O \left({n}\right)$!

The standard trick: A pair of lists, `left` and `right`.

* The `left` list holds the head of the queue
* The `right` list holds the tail, stored in reversed order

When `left` becomes empty, reverse `right` and put it in `left`.

### Queue Implementation

~~~ {.Haskell}
module Queue1 where
import Prelude hiding (reverse, length)
import StrictList (head, tail, reverse)
data Queue a = Q { left  :: StrictList a, leftLen  :: !Int
                 , right :: StrictList a, rightLen :: !Int
                 } deriving (Show, Eq)
empty :: Queue a
empty = Q Empty 0 Empty 0
length :: Queue a -> Int
length (Q _ ll _ rl) = ll + rl
insert :: a -> Queue a -> Queue a
insert e (Q l ll r rl) = Q l ll (e :$ r) (rl + 1)
remove :: Queue a -> (a, Queue a)
remove (Q Empty _  r rl) = remove (Q (reverse r) rl Empty 0)
remove (Q l ll r rl) = (head l, Q (tail l) (ll - 1) r rl)
~~~

### Queue Analysis

Most operations are trivial:

* `empty`: $O \left({1}\right)$
* `insert`: $O \left({1}\right)$
* `length`: $O \left({1}\right)$

`remove` will take some thought

~~~ {.Haskell}
remove :: Queue a -> (a, Queue a)
remove (Q Empty _  r rl) = remove (Q (reverse r) rl Empty 0)
remove (Q l     ll r rl) = (head l, Q (tail l) (ll - 1) r rl)
~~~

* When `left` is non-empty, `remove` takes $O \left({1}\right)$
* But when empty, it takes $O \left({n}\right)$ due to the `reverse`

### Amortizing the Queue

* The `reverse` operation takes place only when `left` is empty

* `left` is empty when:
    * The entire queue is empty
    * `n` values have been `insert`ed without a `remove`
    * All previously reversed values have been `remove`d.

* A `remove` after `n` `inserts` will take `n` operations.

* Those `n` values must be `remove`d before another `reverse` can
  occur.

* So, assigning extra credit to each `insert` should offset the linear
  cost of the `reverse`...

### Not So Fast

~~~ {.Haskell}
let q1 = foldr insert empty "Abracadabra"
let q2 = remove q1 -- Causes a reverse
let q3 = remove q1 -- Causes the same reverse
let q4 = remove q1 -- And again...
~~~

Oops. Persistence screws up our amortization scheme.

Time to make it lazy.

### Queue Take 2 - Paired Lazy Lists

Goal: Reverse `right` *incrementally* before `left` becomes empty.

Means: Periodically append `reverse right` to `left` using *guarded recursion*.

~~~ {.Haskell}
-- rot l r [] = l ++ reverse r
rot [] r a = head r : a
rot l  r a = head l : rot (tail l) (tail r) (head r : a)
~~~

This evaluates up to the (`:`) operation per call to `head` on the
result.

### Queue Take 2 - Paired Lazy Lists (cont.)

Forcing a tail:

* performs one step of the `++` and one step of the `reverse`.
* may also force another tail; but only $\log n$ times.

So worst case improves from $O\left({n}\right)$ to $O\left({\log n}\right)$.

If we never let `right` be longer than `left`, by the time `rotate`
steps through the original `left`, the `reverse` will be complete.

Amortized cost is therefore $O \left({1}\right)$ again, and safe for
persistence.

### Full Implementation

~~~ {.Haskell}
module Queue2 where
import Prelude hiding (length)
data Queue a = Q { left  :: [a], leftLen  :: !Int
                 , right :: [a], rightLen :: !Int
                 } deriving (Show, Eq)
empty :: Queue a
empty = Q [] 0 [] 0
length :: Queue a -> Int
length (Q _ ll _ rl) = ll + rl
insert :: a -> Queue a -> Queue a
insert e (Q l ll r rl) = makeQ (Q l ll (e : r) (rl + 1))
remove :: Queue a -> (a, Queue a)
remove (Q l ll r rl) = (head l, makeQ (Q (tail l) (ll-1) r rl))
makeQ :: Queue a -> Queue a -- Preserve invariant:  |R| <= |L|
makeQ q@(Q l ll r rl)
  | rl <= ll  = q
  | otherwise = Q (rot l r []) (ll + rl) [] 0
rot :: [a] -> [a] -> [a] -> [a]
rot [] r a = head r : a
rot l  r a = head l : rot (tail l) (tail r) (head r : a)
~~~

## Numerical Representations

### Arithmetic

> Many people regard arithmetic as a trivial thing that children learn
> and computers do, but we will see that arithmetic is a fascinating
> topic with many interesting facets. It is important to make a
> thorough study of efficient methods for calculating with numbers,
> since arithmetic underlies so many computer applications.

Donald E. Knuth, from "The Art of Computer Programming"

### Nat and List

Inductive definition of Naturals:

~~~ {.Haskell}
data Nat = Zero
         | Succ Nat
~~~

Inductive definition of Lists:

~~~ {.Haskell}
data List a = Nil
            | Cons a (List a)
~~~

Functions on `Nat` and `List` are analogous too:

* `inc` of a `Nat` is like `cons` on a `List`
* `dec` of a `Nat` is like removing `head` of a `List`
* adding two `Nat`s is like combining two `List`s

Lists are *numbers* that contain *values*.

`Nat` and `List` implement a *unary* number system.

### Positional Numbers - Definitions

A *positional number* is written as a series of digits:
$$
b_0 \dots b_{m-1}
$$
The digit $b_0$ is the *least significant digit*

The digit $b_{m-1}$ is the *most significant digit*

Each digit $b_i$ has a *weight* $w_i$, so the value is:
$$
\sum\limits_{i=0}^{m-1} b_i w_i
$$
A number is *base* $B$ if $w_i=B^i$ and $D_i=\left\{0,\dots,B-1\right\}$.

Usually weights are increasing sequences of powers and $D_i$ is the
same for every digit.

### Redundancy

A number system is *redundant* if there is more than one way to
represent some numbers.

Take the system where $w_i = 2^i$ and $D_i = \left\{0,1,2\right\}$

Decimal $13$ can be written:

* $1011$
* $1201$
* $122$

### Dense and Sparse Representations

*Dense* representations are simple lists/sequences of digits,
 including $0$.

*Sparse* representations exclude $0$ digits, so must include either:

* *rank* (the index in the sequence) or
* *weight*

### Dense Example

Straightforward list of binary digits, but least significant bit first

~~~ {.Haskell}
data Digit = Zero | One
type DenseNat = [Digit] -- increasing order of significance
inc []          = [One]
inc (Zero : ds) = One  : ds
inc (One  : ds) = Zero : inc ds -- carry
dec [One] = []
dec (One  : ds) = Zero : ds
dec (Zero : ds) = One  : dec ds -- borrow
add ds [] = ds
add [] ds = ds
add (d : ds1) (Zero : ds2) = d : add ds1 ds2
add (Zero : ds1) (d : ds2) = d : add ds1 ds2
add (One : ds1) (One : ds2) = Zero : inc (add ds1 ds2) -- carry
~~~

### Sparse Example

Store a list of $d_i w_i$ values, so `Zero` digits won't appear

~~~ {.Haskell}
type SparseNat = [Int] -- increasing list of powers of 2
carry w []     = [w]
carry w ws@(w' : rest)
   | w < w'    = w : ws
   | otherwise = carry (2 * w) rest
borrow w ws@(w' : rest)
   | w < w'    = rest
   | otherwise = w : borrow (2 * w) ws
inc ws = carry 1 ws
dec ws = borrow 1 ws
add ws [] = ws
add [] ws = ws
add m@(w1 : ws1) n@(w2 : ws2)
   | w1 < w2   = w1 : add ws1 n
   | w2 < w1   = w2 : add m ws2
   | otherwise = carry (2 * w1) (add ws1 ws2)
~~~

### Binary Numerical Representations

We can transform a positional number system into a data structure:

* Replace sequence of digits with a sequence of trees
* Number and size of trees is determined by representation of numbers

For example:

* The binary representation of $73$ is $1001001$
* A collection of size $73$ would contain three trees:
    * size $1$
    * size $8$
    * size $64$

### Complete Binary Leaf Trees

A binary leaf tree is either:

* a `Leaf`, containing a value; or
* a `Fork`, containing two binary leaf trees

A *complete* binary leaf tree has all `Leaf` nodes at the same rank.

~~~ {.Haskell}
data BLT a = Leaf a
           | Fork (BLT a) (BLT a)
~~~

### Binary Random-Access List

Let's combine a dense binary representation with a `BLT`

First, let's annotate the `BLT`s with their size:

~~~ {.Haskell}
data BLT a = Leaf a
           | Fork !Int (BLT a) (BLT a) deriving (Show)
~~~

Recall the types we used for `DenseNat`:

~~~ {.Haskell}
data Digit = Zero | One
   deriving (Show, Ord, Eq)
type DenseNat = [Digit] -- increasing order of significance
~~~

We'll adapt them to store a `BLT` on `One` digits:

~~~ {.Haskell}
data Digit a = Zero | One (BLT a)
   deriving (Show, Ord, Eq)
type RandList a = [Digit a]
~~~

### Some RandList Operations

We had an `inc` operation for `Nat`

~~~ {.Haskell}
inc :: DenseNat -> DenseNat
inc []          = [One]
inc (Zero : ds) = One  : ds
inc (One  : ds) = Zero : inc ds -- carry
~~~

We adapt it to `cons` for `RandList`

~~~ {.Haskell}
cons :: a -> RandList a -> RandList a
cons x ts = insTree (Leaf x) ts

insTree t  []            = [One t]
insTree t  (Zero   : ts) = One t : ts
insTree t1 (One t2 : ts) = Zero  : insTree (link t1 t2) ts
   where link t1 t2 = Fork (size t1 + size t2) t1 t2
         size (Leaf _)      = 1
         size (Fork n _ _ ) = n
~~~

### Some RandList Operations

We had a `dec` operation for `Nat`

~~~ {.Haskell}
dec :: DenseNat -> DenseNat
dec [One]       = []
dec (One  : ds) = Zero : ds
dec (Zero : ds) = One  : dec ds -- borrow
~~~

We adapt it to `tail` for `RandList`

~~~ {.Haskell}
tail :: RandList a -> RandList a
tail ts = ts' where (_, ts') = borrowTree ts

borrowTree [One t]        = (t,  []          )
borrowTree (One t : ts)   = (t,  Zero   : ts )
borrowTree (Zero  : ts)   = (t1, One t2 : ts')
   where (Fork _ t1 t2, ts') = borrowTree ts
~~~

### Properties of RandList

We have arranged an $n$-length sequence as a $\log\left(n+1\right)$
length list of trees of $\log n$ depth.

Operations `cons`, `head`, and `tail` perform $O\left({1}\right)$
work per digit, so their worst-case is $O\left({\log n}\right)$

Operations `lookup` and `update` take at most $O\left({\log n}\right)$
to find the right tree and $O\left({\log n}\right)$ to find the right
element, for a total of $O\left({\log n}\right)$ worst-case time.

For mostly-random access, this will be a significant improvement over
`List`'s $O\left({n}\right)$ worst-case performance.

### Skew Binary Numbers

Our standard binary representation's performance was hindered by
*cascading carries*.

In the **skew binary number** representation:

* The weight $w_i$ of the $i$th digit is $2^{i+1}-1$
* The set of digits $D_i = \left\{0,1,2\right\}$
* Only the lowest non-$0$ digit may be $2$ (for uniqueness)

The number $92$ is $002101$ in this representation:

\begin{align*}
\sum\limits_{i=0}^{5} b_i w_i &=
0+0+(2^{2+1}-1)\times{2}+(2^{3+1}-1)+0+(2^{5+1}-1) \\
&= 14+15+63 \\
&= 92
\end{align*}

### Skew Binary Number Operations

What makes skew binary representation useful:

* $w_i$ is $2^{i+1}-1$
* $1+2\left(2^{i+1}-1\right) = 2^{i+2}-1$

This means that when the lowest non-$0$ digit is $2$, the `inc`
operation:

* resets the $2$ to $0$
* increments the next digit from $0$ to $1$ *or* $1$ to $2$

If there is no $2$, just increment the lowest digit from $0$ to $1$ or
from $1$ to $2$. These only take $O\left({1}\right)$ time!

We must use a *sparse* representation to keep $O\left({1}\right)$
access to the first non-$0$ digit.

### Skew Binary Random Access List

~~~ {.Haskell}
data BTree a = Leaf a | Node a (BTree a) (BTree a)
data Digit a = Digit !Int (BTree a)
type RList a = [Digit a]
cons x (Digit w1 t1 : Digit w2 t2 : ts')
   | w1 == w2  = Digit (1 + w1 + w2) (Node x t1 t2) : ts'
cons x ts      = Digit 1 (Leaf x) : ts
head (Digit 1 (Leaf x)     : _) = x
head (Digit _ (Node x _ _) : _) = x
tail (Digit 1 (Leaf _)       : ts) = ts
tail (Digit w (Node _ t1 t2) : ts) =
  Digit w' t1 : Digit w' t2 : ts
 where w' = w `div` 2
lookup (Digit w t : ts) i
   | i < w = lookupTree w t i
   | otherwise = lookup ts (i - w)
lookupTree 1 (Leaf x)     0 = x
lookupTree _ (Node x _ _) 0 = x
lookupTree w (Node _ t1 t2) i
   | i < w'    = lookupTree w' t1 (i-1)
   | otherwise = lookupTree w' t2 (i-1-w')
 where w' = w `div` 2
~~~

## Non-Uniform Structure

### Random Access Lists

Here is our first Random Access List type:

~~~ {.Haskell}
data BLT a = Leaf a
           | Fork (BLT a) (BLT a)
data Digit a = Zero | One (BLT a)
type RandList a = [Digit a]
~~~

It is possible to construct invalid members of this type:

~~~ {.Haskell}
[One (Fork (Leaf 1) (Fork (Leaf 2) (Leaf 3)))]
~~~

Constraint violations:

* All trees should be *complete*
* Incorrect height for its numerical representation

### Random Access Lists (cont.)

We can encode the fact that we only want *complete* trees of the
correct height by combining the structures and "pairing" the type
parameter on each recurrence:

~~~ {.Haskell}
data Fork t = Fork t t -- this is just like (t,t)
data RandList t = Nil
                | Zero  (RandList (Fork t))
                | One t (RandList (Fork t))
~~~

This is known as *polymorphic recursion* or *non-uniform data types*

Some examples:

~~~ {.Haskell}
Nil
One 1 Nil
Zero (One (Fork 1 2) Nil)
Zero (Zero (One (Fork (Fork 1 2) (Fork 3 4)) Nil))
~~~

### Random Access List Implementation

~~~ {.Haskell}
data Fork t = Fork t t
data RandList t = Nil
                | Zero  (RandList (Fork t))
                | One t (RandList (Fork t))
cons :: a -> RandList a -> RandList a
cons x  Nil         = One x Nil
cons x1 (Zero ds)   = One x1 ds
cons x1 (One x2 ds) = Zero (cons (Fork x1 x2) ds)
front :: RandList a -> (a, RandList a)
front Nil = error "Empty RandList"
front (One x Nil) = (x,  Nil)
front (One x ds)  = (x,  Zero ds)
front (Zero ds)   = (x1, One x2 ds')
  where (Fork x1 x2, ds') = front ds
find :: Int -> RandList a -> a
find 0 Nil = error "Not found"
find 0 (One x _)  = x
find i (One _ ds) = find (i-1) (Zero ds)
find i (Zero ds)  = if i`mod`2 == 0 then x else y
  where (Fork x y) = find (i`div`2) ds
~~~

### Other Numerical Representations

We have only scratched the surface of numerical representations

* Different tree structures at digits
    * complete binary trees
    * complete leaf trees
    * binomial trees
    * pennants

* Different number systems
    * zeroless binary
    * redundant binary
    * skew binary
    * segmented binary
    * higher bases
    * fibonacci numbers
    * factorials - random permutation algorithm

# Real-World Structures

## Zippers

### What is a Zipper?

Imagine you have a large, complex data structure...

A zipper is a *complementary* structure that tracks a navigation path
through the primary structure.

It contains:

* a *focus point* that refers to the current point in the navigation
* the *path* of the traversal from the root to focus point

Operators on a zipper:

* navigation: `goUp`, `goDown`, `goLeft`, searching, etc.
* update: `replace` the focus point, `insertLeft`, etc.

Traversing a structure with a zipper turns it *inside-out*, and moving
back to the head pulls it back together, like a *zipper*.

### A Zipper on Rose Trees

First, a tree structure, possibly representing a document:

~~~ {.Haskell}
data Tree a = Item a | Section [Tree a]
~~~

We need a structure to record the path; we need to track steps up and
down the structure as well as the position in the list

~~~ {.Haskell}
data Path a = Top | Node [Tree a] (Path a) [Tree a]
~~~

Finally, we need to combine the path with the focus point, which will
just be a subtree of the tree we are traversing:

~~~ {.Haskell}
data Location a = Loc (Tree a) (Path a)
~~~

To build a zipper, we just combine the root of the tree with the `Top`
path constructor in a `Location`

### A Zipper on Rose Trees - Navigation

Basic navigation (error checking elided):

~~~ {.Haskell}
data Path a = Top | Node [Tree a] (Path a) [Tree a]
data Location a = Loc (Tree a) (Path a)

goDown  (Loc (Section (t1:trees)) p) =
   Loc t1 (Node []       p  trees)

goLeft  (Loc t (Node (l:left) up right)) =
   Loc l  (Node left     up (t:right))

goRight (Loc t (Node left up (r:right))) =
   Loc r  (Node (t:left) up right)

goUp    (Loc t (Node left up right)) =
   Loc (Section (reverse left ++ t:right)) up
~~~

### A Zipper on Rose Trees - Update

Modifying at the current position:

~~~ {.Haskell}
data Path a = Top | Node [Tree a] (Path a) [Tree a]
data Location a = Loc (Tree a) (Path a)

replace (Loc _ p) t = Loc t p

insertRight (Loc t (Node left up right)) r =
   Loc t  (Node left     up (r:right))

insertLeft  (Loc t (Node left up right)) l =
   Loc t  (Node (l:left) up right)

insertDown  (Loc (Section sons) p) t1 =
   Loc t1 (Node []       p  sons)
~~~

### Crazy Zipper Facts

The zipper structure for a given data structure can be derived
mechanically from its algebraic type signature.

The zipper type signature is the *first derivative* (in the sense you
learned in Calculus) of its parent signature.

Zippers are *one-hole contexts* for their parent type. The *focus
point* is the hole.

Zippers can also be represented by *delimited continuations* of a
traversing process. These can be used with any `Traversable` without
knowing its concrete type at all!

Continuation-capturing traversal functions invert the control
structure in a similar manner to how taking the derivative of the data
type inverts its data structure.

## 2-3 Finger Trees

### Not this time

I lied. There's no time to talk about these. Sorry!

But you should be prepared to read this paper about them now:

http://www.cs.ox.ac.uk/people/ralf.hinze/publications/FingerTrees.pdf

## Hash Array Mapped Tries

### Downsides To This Stuff

Way back at the beginning, I mentioned some downsides:

* Extra overhead for references vs. in-place storage
* Locality of data can suffer

These are constant-factor issues, but with the advances in the cache
structure of machines, they are an increasingly *large* constant
factor.

Bad memory access patterns can make memory fetches take *orders of
magnitude* longer.

The solution is to use *bigger* chunks of data; do more copying and
less pointer chasing.

### Hash Array Mapped Trie

Key features:

* Instead of a branching factor of 2, use a factor of 32 or so.
* Each node (aside from leaf nodes) contains up to 32 elements.
* $n$ bits of the key index the array at the next level
* Bit population count is used to represent sparse arrays

~~~ {.Haskell}
type Key    = Word
type Bitmap = Word
data HAMT a = Empty
            | BitmapIndexed !Bitmap !(Vector (HAMT a))
            | Leaf !Key a
            | Full !(Vector (HAMT a))
~~~

### Hash Array Mapped Trie - More Explanation

A *trie* is a tree with a branching factor $k$, where $k$ is the size
of alphabet of key elements.

A trie for caseless English words would have branching factor 26.

The first symbol of the key determines the index to the first tree
level; second symbol to the next, etc.

HAMT divides a key (vector index or hashed key) into bit fields:

* a branching factor of 32 uses 5-bit fields
* A 32-bit key divides into 6 fields, with 2 bits remaining

HAMT stores children sparsely: Bitmap determines child presence,
vector stores only present children.

Clever bit-level math determines child vector index from bitmap.

### Hash Array Mapped Trie - Lookup

~~~ {.Haskell}
maskIdx b m = popCount (b .&. (m - 1))
mask k s = shiftL 1 (subkey k s)

subkeyMask = 1 `shiftL` bitsPerSubkey - 1
subkey k s = fromIntegral $ shiftR k s .&. subkeyMask

lookup :: Key -> HAMT a -> Maybe a
lookup k t = lookup' k 0 t
lookup' k s t
    = case t of
        Empty -> Nothing
        Leaf kx x
            | k == kx   -> Just x
            | otherwise -> Nothing
        BitmapIndexed b v ->
            let m = mask k s in
            if b .&. m == 0
                then Nothing
                else lookup' k (s+subkeyWidth) (v ! maskIdx b m)
        Full v -> lookup' k (s+subkeyWidth) (v ! subkey k s)
~~~

# Conclusion

### Functional Data Structure Fundamentals

Functional data structures provide:

* Safety via Immutability
* Efficient Persistence
* Clear & Simple Implementation

Many are based around some core ideas:

* Numerical Representations
* Lazy Amortization
* Non-Uniform Type Structure

The field is still developing!

Studying Functional Data Structures will teach you a lot about
functional languages.