// Tunstall Coding - Variable-to-fixed length optimal compression
Maps variable-length sequences to fixed-length codes.
Dictionary built using symbol probability distribution.
Fixed-length codes enable simple, fast decoding.
Tunstall coding builds a dictionary of variable-length sequences mapped to fixed-length binary codes. Starting with single symbols, it iteratively expands the highest-probability sequence by appending each alphabet symbol. This creates an optimal variable-to-fixed length code where more probable sequences get their own codewords.
Text: 'aabaa' with P(a)=0.8, P(b)=0.2 Dictionary building (3-bit codes): Initial: a, b Expand 'a': aa, ab Expand 'aa': aaa, aab Expand 'ab': aba, abb Final dictionary (8 codes): aaa → 000 aab → 001 aba → 010 abb → 011 aa → 100 ab → 101 a → 110 b → 111 Encoding: 'aabaa' → 001 100 (aab + aa)
Tunstall coding is a variable-to-fixed length entropy encoding method. Unlike Huffman (fixed-to-variable), Tunstall maps variable-length source sequences to fixed-length codewords, making it ideal for applications requiring constant-rate output.
Huffman: fixed input → variable output (e.g., 'a' → '0', 'b' → '10'). Tunstall: variable input → fixed output (e.g., 'aa' → '000', 'ab' → '001'). Tunstall is better for hardware and synchronization but typically achieves lower compression ratios.
Start with single symbols. Repeatedly: 1) Find the sequence with highest probability, 2) Remove it and add all possible one-symbol extensions, 3) Continue until reaching desired dictionary size (2^n for n-bit codes).
Tunstall coding is used in speech compression, image coding, and situations requiring fixed-rate output. It's particularly valuable in hardware implementations and real-time systems where variable-length codes would complicate timing.