Learned Index · kerkstra.dev

keys0

leaves0

learned indexalex · insert keys, watch the cdf

datadist

lookupkey42

opsspd1x

run

how it works

ALEX is a learned index that replaces B-tree pages with linear regression models. Each internal node predicts which child contains the target key. Each leaf holds a gapped array, a sorted array with empty slots reserved for future inserts. The model at each leaf predicts where within the array a key should go.

The key insight: sorted data forms a cumulative distribution function (CDF), and a CDF can be approximated by piecewise linear functions. Instead of navigating a tree of pivot keys (as in a B-tree), you evaluate a linear function at each level. Prediction errors are bounded, so a short local scan corrects any model inaccuracy.

ALEX adapts to the data distribution. Dense key regions get more model segments (finer approximation), sparse regions get fewer. When a gapped array fills up, the leaf splits and both children train new models. This is analogous to a B-tree page split but driven by model error rather than fixed capacity.

Ding et al. introduced ALEX at SIGMOD 2020. Unlike the original Learned Index (Kraska et al., 2018), ALEX is fully updatable with O(log n) inserts and lookups. The FITing-Tree (Galakatos et al., SIGMOD 2019) is a related structure that uses an error-bounded approach. RadixSpline (Kipf et al., SIGMOD 2020) takes a simpler hybrid approach with a radix table on top of a spline approximation.

Try different distributions. Uniform data produces evenly-spaced model segments. Skewed data forces the tree to allocate more segments where keys cluster, fewer where they are sparse.

Ding et al. (SIGMOD 2020) / Kraska et al. "The Case for Learned Index Structures" (2018)