Wavelet Tree · kerkstra.dev

char—

pos0

rank—

wavelet treebitvector rank · pick a char, query rank

querychar

animspd1x

run

how it works

A wavelet tree represents a string over an alphabet of size sigma, supporting rank,select, and access queries in O(log sigma) time. Each internal node stores a bitvector that records which half of the alphabet each character belongs to. The left child gets the subsequence of left-half characters, the right child gets the rest.

To answer rank(c, i) — how many times does c appear up to position i — walk from root to leaf. At each node, count how many bits match c's side of the alphabet split up to the current position. That count becomes the new position in the child. When you reach the leaf for c, the position is the answer.

Grossi, Gupta, and Vitter introduced wavelet trees at SODA 2003 as the backbone for compressed full-text indexes. The wavelet matrix variant (Claude and Navarro, 2015) rearranges the layout for better cache performance. Modern implementations in the SDSL library power compressed suffix arrays and FM-indexes used in bioinformatics tools like bowtie2 for genome alignment.

The structure uses n log(sigma) bits total, matching the zero-order entropy of the string. Every query touches exactly one bitvector per level, so the depth of the tree directly determines the query cost. Pick a character and position, then watch the rank query propagate down.

Grossi, Gupta & Vitter (SODA 2003) / Wikipedia