--- title: Functional Data Structures subtitle: Analysis and Implementation author: Levi Pearson ... # Intro ### Outline 1. Basics of Functional Data Structures * Immutability * Persistent vs. Ephemeral * Node Structure * Data Sharing 2. Analysis Techniques * Amortization Overview * Amortization with Persistence 3. Design Techniques * Lazy Rebuilding and Scheduling * Numerical Representations * Non-Uniform Structure 4. Real-World Examples * Zippers * 2-3 Finger Trees * Hash Array Mapped Tries ### Motivation Functional data structures provide: * Relatively simple implementations * Good worst-case performance bounds * Persistence for free * Ease of use in concurrent programs ## Basics ### Functional Data Structures? What do we mean by "Functional Data Structure"? * Built from immutable values * Can be implemented in a purely functional language * No in-place updates! Obviously not the data structures taught in Intro to Algorithms class... ### Persistence Most imperative structures are *ephemeral*; updates destroy previous versions A data structure is *persistent* when old versions remain after updates * Developed in the context of imperative structures * Imperative versions of persistent structures can be quite complex * Adding persistence can change performance bounds * Purely functional structures are automatically persistent! ### Value of Persistence Why would we want persistence anyway? * Safe data sharing * No locking required * Restoration of previous states * Interesting algorithms (e.g. in computational geometry) ### Node-Based Structure Functional data structures share structural characteristics: * Built from *nodes* and *references* * Nodes contain references: * one or more references to other nodes * one or more references to values * A structure may be built from one or more *kinds* of node Examples: ~~~ {.haskell} data List a = Nil | Cons a (List a) data Tree a = Leaf a | Branch (Tree a) (Tree a) ~~~ ### Data Sharing The fine-grained node structure has some benefits: * New versions of structures share the bulk of their data with old versions * Copying on updates is limited to *nodes* rather than *values* * Only the nodes on the path from the root to change point are copied. Also some drawbacks: * Extra overhead for references vs. in-place storage * Locality of data can suffer # Analysis ## Amortization ### Structures Enable Algorithms * Data structures support *operations* on them * Operations are implemented via *algorithms* * Algorithms perform operations We bottom out at two primitive operations: * Construction of an ADT value * Destructuring/Pattern Matching of an ADT value ### Algorithm comparison We judge between algorithms based on analysis: * How many primitive operations are needed to complete the algorithm? * How does the number of operations scale as the problem size grows? *Asymptotic analysis* gives an upper bound to the growth of operation count as the size grows to infinity. ### Worst Case Typical analysis is *worst-case*: * Find a function such that some constant times that function of the input size is *always* greater than the actual count of operations for any single operation. * That function is a worst-case upper bound. * The smallest upper bound is how we characterize the asymptotic worst-case behavior. ### Average Case Sometimes we consider *average-case*: * Find the upper bounds of all possible single operation cases. * Model the statistical likelihood of the cases. * Find the smallest upper bound of the weighted combination of cases. ### Amortized An *amortized* analysis is different: * Find the upper bounds of all possible single operation cases. * Consider the possible *sequences* of operations. * Show that *expensive* operations always run infrequently enough to distribute their cost among *cheap* operations ### When To Amortize? A **worst-case** analysis is necessary when you have hard latency bounds, e.g. real-time requirements. **Average** or **amortized** analysis can give a better sense of *expected throughput* of an algorithm. **Amortized** analysis is useful when most operations are cheap, but occasionally an expensive one must be performed. Examples: * Array doubling * Rehashing * Tree balancing Developed in the context of *ephemeral* data structures. ### The Persistence Wrench An ephemeral structure has a single logical future: Once an operation is performed, there's no going back. The amortization accounting process is straightforward: * Assign an extra *credit* cost to cheap cases. * Prove that an expensive case only occurs after sufficient credit has accumulated to pay for it. Persistent structures may have *multiple* logical futures. Consider a tree very close to needing an expensive rebalancing: ~~~ {.Haskell} -- tr1 refers to the teetering tree let tr2 = insert foo tr1 -- causes a rebalance: expensive let tr3 = insert foo tr1 -- causes another rebalance let tr4 = insert foo tr1 -- causes yet another rebalance ~~~ How to account for that? You can't spend your savings more than once. ## Functional Amortization ### Tackling Persistence with Laziness Chris Okasaki discovered a way around the multiple-future problem: * Delay expensive operations via lazy evaluation * Account for expensive operations via *debt* * Savings can only be spent once, but over-paying on debt is ok This works because: * A suspended computation places a *thunk* in the new structure * When a thunk is *forced*, it performs its calculation * The value replaces the thunk *in place* The previous example, with lazy evaluation: ~~~ {.Haskell} -- tr1 refers to the teetering tree let tr2 = insert foo tr1 -- creates a rebalancing thunk: cheap let tr3 = insert bar tr2 -- forces rebalancing: expensive let tr4 = insert baz tr2 -- sees previous rebalancing ~~~ ### Outline of Amortized Lazy Analysis Like a layaway plan: 1. Find a computation you can't afford to execute 2. Create a suspension for it, assign a value proportional to its shared cost 3. Pay the debt off a little at a time 4. When the debt is paid off, the suspension may be executed Consider each logical future as if it were the *only one*. Each future must fully pay before invoking any suspension. ### Restoring Worst-Case Bounds Sometimes the latency budget requires a tight worst-case bound on all operations. Worst-case bounds can be restored via *scheduling* * Execute suspended computations in bounded steps * Execute the steps as they are paid off This has a small space and complexity overhead, but bounds the latency. ### Using Analysis Results Sure that sounds like *fun*, but what's it good for? Different operations over structures (e.g. head-insert, tail-insert, search) are necessary to implement higher-level algorithms. Choose the data structure that: * Provides good bounds for the operations your algorithm needs * Provides the latency guarantees you need Other factors as well, of course # Implementation Patterns ## Lazy Rebuilding ### Queues, Take 1 The essence of the Queue: * `insert` adds an element at the end of the queue * `remove` takes an element from the front of the queue How to represent it? What about a List? * Our `remove` operation is $O \left({1}\right)$ * But `insert` is $O \left({n}\right)$! The standard trick: A pair of lists, `left` and `right`. * The `left` list holds the head of the queue * The `right` list holds the tail, stored in reversed order When `left` becomes empty, reverse `right` and put it in `left`. ### Queue Implementation ~~~ {.Haskell} module Queue1 where import Prelude hiding (reverse, length) import StrictList (head, tail, reverse) data Queue a = Q { left :: StrictList a, leftLen :: !Int , right :: StrictList a, rightLen :: !Int } deriving (Show, Eq) empty :: Queue a empty = Q Empty 0 Empty 0 length :: Queue a -> Int length (Q _ ll _ rl) = ll + rl insert :: a -> Queue a -> Queue a insert e (Q l ll r rl) = Q l ll (e :$ r) (rl + 1) remove :: Queue a -> (a, Queue a) remove (Q Empty _ r rl) = remove (Q (reverse r) rl Empty 0) remove (Q l ll r rl) = (head l, Q (tail l) (ll - 1) r rl) ~~~ ### Queue Analysis Most operations are trivial: * `empty`: $O \left({1}\right)$ * `insert`: $O \left({1}\right)$ * `length`: $O \left({1}\right)$ `remove` will take some thought ~~~ {.Haskell} remove :: Queue a -> (a, Queue a) remove (Q Empty _ r rl) = remove (Q (reverse r) rl Empty 0) remove (Q l ll r rl) = (head l, Q (tail l) (ll - 1) r rl) ~~~ * When `left` is non-empty, `remove` takes $O \left({1}\right)$ * But when empty, it takes $O \left({n}\right)$ due to the `reverse` ### Amortizing the Queue * The `reverse` operation takes place only when `left` is empty * `left` is empty when: * The entire queue is empty * `n` values have been `insert`ed without a `remove` * All previously reversed values have been `remove`d. * A `remove` after `n` `inserts` will take `n` operations. * Those `n` values must be `remove`d before another `reverse` can occur. * So, assigning extra credit to each `insert` should offset the linear cost of the `reverse`... ### Not So Fast ~~~ {.Haskell} let q1 = foldr insert empty "Abracadabra" let q2 = remove q1 -- Causes a reverse let q3 = remove q1 -- Causes the same reverse let q4 = remove q1 -- And again... ~~~ Oops. Persistence screws up our amortization scheme. Time to make it lazy. ### Queue Take 2 - Paired Lazy Lists Goal: Reverse `right` *incrementally* before `left` becomes empty. Means: Periodically append `reverse right` to `left` using *guarded recursion*. ~~~ {.Haskell} -- rot l r [] = l ++ reverse r rot [] r a = head r : a rot l r a = head l : rot (tail l) (tail r) (head r : a) ~~~ This evaluates up to the (`:`) operation per call to `head` on the result. ### Queue Take 2 - Paired Lazy Lists (cont.) Forcing a tail: * performs one step of the `++` and one step of the `reverse`. * may also force another tail; but only $\log n$ times. So worst case improves from $O\left({n}\right)$ to $O\left({\log n}\right)$. If we never let `right` be longer than `left`, by the time `rotate` steps through the original `left`, the `reverse` will be complete. Amortized cost is therefore $O \left({1}\right)$ again, and safe for persistence. ### Full Implementation ~~~ {.Haskell} module Queue2 where import Prelude hiding (length) data Queue a = Q { left :: [a], leftLen :: !Int , right :: [a], rightLen :: !Int } deriving (Show, Eq) empty :: Queue a empty = Q [] 0 [] 0 length :: Queue a -> Int length (Q _ ll _ rl) = ll + rl insert :: a -> Queue a -> Queue a insert e (Q l ll r rl) = makeQ (Q l ll (e : r) (rl + 1)) remove :: Queue a -> (a, Queue a) remove (Q l ll r rl) = (head l, makeQ (Q (tail l) (ll-1) r rl)) makeQ :: Queue a -> Queue a -- Preserve invariant: |R| <= |L| makeQ q@(Q l ll r rl) | rl <= ll = q | otherwise = Q (rot l r []) (ll + rl) [] 0 rot :: [a] -> [a] -> [a] -> [a] rot [] r a = head r : a rot l r a = head l : rot (tail l) (tail r) (head r : a) ~~~ ## Numerical Representations ### Arithmetic > Many people regard arithmetic as a trivial thing that children learn > and computers do, but we will see that arithmetic is a fascinating > topic with many interesting facets. It is important to make a > thorough study of efficient methods for calculating with numbers, > since arithmetic underlies so many computer applications. Donald E. Knuth, from "The Art of Computer Programming" ### Nat and List Inductive definition of Naturals: ~~~ {.Haskell} data Nat = Zero | Succ Nat ~~~ Inductive definition of Lists: ~~~ {.Haskell} data List a = Nil | Cons a (List a) ~~~ Functions on `Nat` and `List` are analogous too: * `inc` of a `Nat` is like `cons` on a `List` * `dec` of a `Nat` is like removing `head` of a `List` * adding two `Nat`s is like combining two `List`s Lists are *numbers* that contain *values*. `Nat` and `List` implement a *unary* number system. ### Positional Numbers - Definitions A *positional number* is written as a series of digits: $$ b_0 \dots b_{m-1} $$ The digit $b_0$ is the *least significant digit* The digit $b_{m-1}$ is the *most significant digit* Each digit $b_i$ has a *weight* $w_i$, so the value is: $$ \sum\limits_{i=0}^{m-1} b_i w_i $$ A number is *base* $B$ if $w_i=B^i$ and $D_i=\left\{0,\dots,B-1\right\}$. Usually weights are increasing sequences of powers and $D_i$ is the same for every digit. ### Redundancy A number system is *redundant* if there is more than one way to represent some numbers. Take the system where $w_i = 2^i$ and $D_i = \left\{0,1,2\right\}$ Decimal $13$ can be written: * $1011$ * $1201$ * $122$ ### Dense and Sparse Representations *Dense* representations are simple lists/sequences of digits, including $0$. *Sparse* representations exclude $0$ digits, so must include either: * *rank* (the index in the sequence) or * *weight* ### Dense Example Straightforward list of binary digits, but least significant bit first ~~~ {.Haskell} data Digit = Zero | One type DenseNat = [Digit] -- increasing order of significance inc [] = [One] inc (Zero : ds) = One : ds inc (One : ds) = Zero : inc ds -- carry dec [One] = [] dec (One : ds) = Zero : ds dec (Zero : ds) = One : dec ds -- borrow add ds [] = ds add [] ds = ds add (d : ds1) (Zero : ds2) = d : add ds1 ds2 add (Zero : ds1) (d : ds2) = d : add ds1 ds2 add (One : ds1) (One : ds2) = Zero : inc (add ds1 ds2) -- carry ~~~ ### Sparse Example Store a list of $d_i w_i$ values, so `Zero` digits won't appear ~~~ {.Haskell} type SparseNat = [Int] -- increasing list of powers of 2 carry w [] = [w] carry w ws@(w' : rest) | w < w' = w : ws | otherwise = carry (2 * w) rest borrow w ws@(w' : rest) | w < w' = rest | otherwise = w : borrow (2 * w) ws inc ws = carry 1 ws dec ws = borrow 1 ws add ws [] = ws add [] ws = ws add m@(w1 : ws1) n@(w2 : ws2) | w1 < w2 = w1 : add ws1 n | w2 < w1 = w2 : add m ws2 | otherwise = carry (2 * w1) (add ws1 ws2) ~~~ ### Binary Numerical Representations We can transform a positional number system into a data structure: * Replace sequence of digits with a sequence of trees * Number and size of trees is determined by representation of numbers For example: * The binary representation of $73$ is $1001001$ * A collection of size $73$ would contain three trees: * size $1$ * size $8$ * size $64$ ### Complete Binary Leaf Trees A binary leaf tree is either: * a `Leaf`, containing a value; or * a `Fork`, containing two binary leaf trees A *complete* binary leaf tree has all `Leaf` nodes at the same rank. ~~~ {.Haskell} data BLT a = Leaf a | Fork (BLT a) (BLT a) ~~~ ### Binary Random-Access List Let's combine a dense binary representation with a `BLT` First, let's annotate the `BLT`s with their size: ~~~ {.Haskell} data BLT a = Leaf a | Fork !Int (BLT a) (BLT a) deriving (Show) ~~~ Recall the types we used for `DenseNat`: ~~~ {.Haskell} data Digit = Zero | One deriving (Show, Ord, Eq) type DenseNat = [Digit] -- increasing order of significance ~~~ We'll adapt them to store a `BLT` on `One` digits: ~~~ {.Haskell} data Digit a = Zero | One (BLT a) deriving (Show, Ord, Eq) type RandList a = [Digit a] ~~~ ### Some RandList Operations We had an `inc` operation for `Nat` ~~~ {.Haskell} inc :: DenseNat -> DenseNat inc [] = [One] inc (Zero : ds) = One : ds inc (One : ds) = Zero : inc ds -- carry ~~~ We adapt it to `cons` for `RandList` ~~~ {.Haskell} cons :: a -> RandList a -> RandList a cons x ts = insTree (Leaf x) ts insTree t [] = [One t] insTree t (Zero : ts) = One t : ts insTree t1 (One t2 : ts) = Zero : insTree (link t1 t2) ts where link t1 t2 = Fork (size t1 + size t2) t1 t2 size (Leaf _) = 1 size (Fork n _ _ ) = n ~~~ ### Some RandList Operations We had a `dec` operation for `Nat` ~~~ {.Haskell} dec :: DenseNat -> DenseNat dec [One] = [] dec (One : ds) = Zero : ds dec (Zero : ds) = One : dec ds -- borrow ~~~ We adapt it to `tail` for `RandList` ~~~ {.Haskell} tail :: RandList a -> RandList a tail ts = ts' where (_, ts') = borrowTree ts borrowTree [One t] = (t, [] ) borrowTree (One t : ts) = (t, Zero : ts ) borrowTree (Zero : ts) = (t1, One t2 : ts') where (Fork _ t1 t2, ts') = borrowTree ts ~~~ ### Properties of RandList We have arranged an $n$-length sequence as a $\log\left(n+1\right)$ length list of trees of $\log n$ depth. Operations `cons`, `head`, and `tail` perform $O\left({1}\right)$ work per digit, so their worst-case is $O\left({\log n}\right)$ Operations `lookup` and `update` take at most $O\left({\log n}\right)$ to find the right tree and $O\left({\log n}\right)$ to find the right element, for a total of $O\left({\log n}\right)$ worst-case time. For mostly-random access, this will be a significant improvement over `List`'s $O\left({n}\right)$ worst-case performance. ### Skew Binary Numbers Our standard binary representation's performance was hindered by *cascading carries*. In the **skew binary number** representation: * The weight $w_i$ of the $i$th digit is $2^{i+1}-1$ * The set of digits $D_i = \left\{0,1,2\right\}$ * Only the lowest non-$0$ digit may be $2$ (for uniqueness) The number $92$ is $002101$ in this representation: \begin{align*} \sum\limits_{i=0}^{5} b_i w_i &= 0+0+(2^{2+1}-1)\times{2}+(2^{3+1}-1)+0+(2^{5+1}-1) \\ &= 14+15+63 \\ &= 92 \end{align*} ### Skew Binary Number Operations What makes skew binary representation useful: * $w_i$ is $2^{i+1}-1$ * $1+2\left(2^{i+1}-1\right) = 2^{i+2}-1$ This means that when the lowest non-$0$ digit is $2$, the `inc` operation: * resets the $2$ to $0$ * increments the next digit from $0$ to $1$ *or* $1$ to $2$ If there is no $2$, just increment the lowest digit from $0$ to $1$ or from $1$ to $2$. These only take $O\left({1}\right)$ time! We must use a *sparse* representation to keep $O\left({1}\right)$ access to the first non-$0$ digit. ### Skew Binary Random Access List ~~~ {.Haskell} data BTree a = Leaf a | Node a (BTree a) (BTree a) data Digit a = Digit !Int (BTree a) type RList a = [Digit a] cons x (Digit w1 t1 : Digit w2 t2 : ts') | w1 == w2 = Digit (1 + w1 + w2) (Node x t1 t2) : ts' cons x ts = Digit 1 (Leaf x) : ts head (Digit 1 (Leaf x) : _) = x head (Digit _ (Node x _ _) : _) = x tail (Digit 1 (Leaf _) : ts) = ts tail (Digit w (Node _ t1 t2) : ts) = Digit w' t1 : Digit w' t2 : ts where w' = w `div` 2 lookup (Digit w t : ts) i | i < w = lookupTree w t i | otherwise = lookup ts (i - w) lookupTree 1 (Leaf x) 0 = x lookupTree _ (Node x _ _) 0 = x lookupTree w (Node _ t1 t2) i | i < w' = lookupTree w' t1 (i-1) | otherwise = lookupTree w' t2 (i-1-w') where w' = w `div` 2 ~~~ ## Non-Uniform Structure ### Random Access Lists Here is our first Random Access List type: ~~~ {.Haskell} data BLT a = Leaf a | Fork (BLT a) (BLT a) data Digit a = Zero | One (BLT a) type RandList a = [Digit a] ~~~ It is possible to construct invalid members of this type: ~~~ {.Haskell} [One (Fork (Leaf 1) (Fork (Leaf 2) (Leaf 3)))] ~~~ Constraint violations: * All trees should be *complete* * Incorrect height for its numerical representation ### Random Access Lists (cont.) We can encode the fact that we only want *complete* trees of the correct height by combining the structures and "pairing" the type parameter on each recurrence: ~~~ {.Haskell} data Fork t = Fork t t -- this is just like (t,t) data RandList t = Nil | Zero (RandList (Fork t)) | One t (RandList (Fork t)) ~~~ This is known as *polymorphic recursion* or *non-uniform data types* Some examples: ~~~ {.Haskell} Nil One 1 Nil Zero (One (Fork 1 2) Nil) Zero (Zero (One (Fork (Fork 1 2) (Fork 3 4)) Nil)) ~~~ ### Random Access List Implementation ~~~ {.Haskell} data Fork t = Fork t t data RandList t = Nil | Zero (RandList (Fork t)) | One t (RandList (Fork t)) cons :: a -> RandList a -> RandList a cons x Nil = One x Nil cons x1 (Zero ds) = One x1 ds cons x1 (One x2 ds) = Zero (cons (Fork x1 x2) ds) front :: RandList a -> (a, RandList a) front Nil = error "Empty RandList" front (One x Nil) = (x, Nil) front (One x ds) = (x, Zero ds) front (Zero ds) = (x1, One x2 ds') where (Fork x1 x2, ds') = front ds find :: Int -> RandList a -> a find 0 Nil = error "Not found" find 0 (One x _) = x find i (One _ ds) = find (i-1) (Zero ds) find i (Zero ds) = if i`mod`2 == 0 then x else y where (Fork x y) = find (i`div`2) ds ~~~ ### Other Numerical Representations We have only scratched the surface of numerical representations * Different tree structures at digits * complete binary trees * complete leaf trees * binomial trees * pennants * Different number systems * zeroless binary * redundant binary * skew binary * segmented binary * higher bases * fibonacci numbers * factorials - random permutation algorithm # Real-World Structures ## Zippers ### What is a Zipper? Imagine you have a large, complex data structure... A zipper is a *complementary* structure that tracks a navigation path through the primary structure. It contains: * a *focus point* that refers to the current point in the navigation * the *path* of the traversal from the root to focus point Operators on a zipper: * navigation: `goUp`, `goDown`, `goLeft`, searching, etc. * update: `replace` the focus point, `insertLeft`, etc. Traversing a structure with a zipper turns it *inside-out*, and moving back to the head pulls it back together, like a *zipper*. ### A Zipper on Rose Trees First, a tree structure, possibly representing a document: ~~~ {.Haskell} data Tree a = Item a | Section [Tree a] ~~~ We need a structure to record the path; we need to track steps up and down the structure as well as the position in the list ~~~ {.Haskell} data Path a = Top | Node [Tree a] (Path a) [Tree a] ~~~ Finally, we need to combine the path with the focus point, which will just be a subtree of the tree we are traversing: ~~~ {.Haskell} data Location a = Loc (Tree a) (Path a) ~~~ To build a zipper, we just combine the root of the tree with the `Top` path constructor in a `Location` ### A Zipper on Rose Trees - Navigation Basic navigation (error checking elided): ~~~ {.Haskell} data Path a = Top | Node [Tree a] (Path a) [Tree a] data Location a = Loc (Tree a) (Path a) goDown (Loc (Section (t1:trees)) p) = Loc t1 (Node [] p trees) goLeft (Loc t (Node (l:left) up right)) = Loc l (Node left up (t:right)) goRight (Loc t (Node left up (r:right))) = Loc r (Node (t:left) up right) goUp (Loc t (Node left up right)) = Loc (Section (reverse left ++ t:right)) up ~~~ ### A Zipper on Rose Trees - Update Modifying at the current position: ~~~ {.Haskell} data Path a = Top | Node [Tree a] (Path a) [Tree a] data Location a = Loc (Tree a) (Path a) replace (Loc _ p) t = Loc t p insertRight (Loc t (Node left up right)) r = Loc t (Node left up (r:right)) insertLeft (Loc t (Node left up right)) l = Loc t (Node (l:left) up right) insertDown (Loc (Section sons) p) t1 = Loc t1 (Node [] p sons) ~~~ ### Crazy Zipper Facts The zipper structure for a given data structure can be derived mechanically from its algebraic type signature. The zipper type signature is the *first derivative* (in the sense you learned in Calculus) of its parent signature. Zippers are *one-hole contexts* for their parent type. The *focus point* is the hole. Zippers can also be represented by *delimited continuations* of a traversing process. These can be used with any `Traversable` without knowing its concrete type at all! Continuation-capturing traversal functions invert the control structure in a similar manner to how taking the derivative of the data type inverts its data structure. ## 2-3 Finger Trees ### Not this time I lied. There's no time to talk about these. Sorry! But you should be prepared to read this paper about them now: http://www.cs.ox.ac.uk/people/ralf.hinze/publications/FingerTrees.pdf ## Hash Array Mapped Tries ### Downsides To This Stuff Way back at the beginning, I mentioned some downsides: * Extra overhead for references vs. in-place storage * Locality of data can suffer These are constant-factor issues, but with the advances in the cache structure of machines, they are an increasingly *large* constant factor. Bad memory access patterns can make memory fetches take *orders of magnitude* longer. The solution is to use *bigger* chunks of data; do more copying and less pointer chasing. ### Hash Array Mapped Trie Key features: * Instead of a branching factor of 2, use a factor of 32 or so. * Each node (aside from leaf nodes) contains up to 32 elements. * $n$ bits of the key index the array at the next level * Bit population count is used to represent sparse arrays ~~~ {.Haskell} type Key = Word type Bitmap = Word data HAMT a = Empty | BitmapIndexed !Bitmap !(Vector (HAMT a)) | Leaf !Key a | Full !(Vector (HAMT a)) ~~~ ### Hash Array Mapped Trie - More Explanation A *trie* is a tree with a branching factor $k$, where $k$ is the size of alphabet of key elements. A trie for caseless English words would have branching factor 26. The first symbol of the key determines the index to the first tree level; second symbol to the next, etc. HAMT divides a key (vector index or hashed key) into bit fields: * a branching factor of 32 uses 5-bit fields * A 32-bit key divides into 6 fields, with 2 bits remaining HAMT stores children sparsely: Bitmap determines child presence, vector stores only present children. Clever bit-level math determines child vector index from bitmap. ### Hash Array Mapped Trie - Lookup ~~~ {.Haskell} maskIdx b m = popCount (b .&. (m - 1)) mask k s = shiftL 1 (subkey k s) subkeyMask = 1 `shiftL` bitsPerSubkey - 1 subkey k s = fromIntegral $ shiftR k s .&. subkeyMask lookup :: Key -> HAMT a -> Maybe a lookup k t = lookup' k 0 t lookup' k s t = case t of Empty -> Nothing Leaf kx x | k == kx -> Just x | otherwise -> Nothing BitmapIndexed b v -> let m = mask k s in if b .&. m == 0 then Nothing else lookup' k (s+subkeyWidth) (v ! maskIdx b m) Full v -> lookup' k (s+subkeyWidth) (v ! subkey k s) ~~~ # Conclusion ### Functional Data Structure Fundamentals Functional data structures provide: * Safety via Immutability * Efficient Persistence * Clear & Simple Implementation Many are based around some core ideas: * Numerical Representations * Lazy Amortization * Non-Uniform Type Structure The field is still developing! Studying Functional Data Structures will teach you a lot about functional languages.