> tunstall | variable | fixed <
// Tunstall Coding - Variable-to-fixed length optimal compression
V2F Encoding
Maps variable-length sequences to fixed-length codes.
Probability-Based
Dictionary built using symbol probability distribution.
Easy Decoding
Fixed-length codes enable simple, fast decoding.
>> technical info
How Tunstall Coding Works:
Tunstall coding builds a dictionary of variable-length sequences mapped to fixed-length binary codes. Starting with single symbols, it iteratively expands the highest-probability sequence by appending each alphabet symbol. This creates an optimal variable-to-fixed length code where more probable sequences get their own codewords.
Tunstall Example:
Text: 'aabaa' with P(a)=0.8, P(b)=0.2 Dictionary building (3-bit codes): Initial: a, b Expand 'a': aa, ab Expand 'aa': aaa, aab Expand 'ab': aba, abb Final dictionary (8 codes): aaa → 000 aab → 001 aba → 010 abb → 011 aa → 100 ab → 101 a → 110 b → 111 Encoding: 'aabaa' → 001 100 (aab + aa)
Why Use Tunstall Coding:
- ▸Fixed-length output codes
- ▸Simple decoder implementation
- ▸No synchronization issues
- ▸Good for hardware implementation
- ▸Optimal for memoryless sources
>> frequently asked questions
What is Tunstall coding?
Tunstall coding is a variable-to-fixed length entropy encoding method. Unlike Huffman (fixed-to-variable), Tunstall maps variable-length source sequences to fixed-length codewords, making it ideal for applications requiring constant-rate output.
Tunstall vs Huffman coding?
Huffman: fixed input → variable output (e.g., 'a' → '0', 'b' → '10'). Tunstall: variable input → fixed output (e.g., 'aa' → '000', 'ab' → '001'). Tunstall is better for hardware and synchronization but typically achieves lower compression ratios.
How is the dictionary built?
Start with single symbols. Repeatedly: 1) Find the sequence with highest probability, 2) Remove it and add all possible one-symbol extensions, 3) Continue until reaching desired dictionary size (2^n for n-bit codes).
Where is Tunstall coding used?
Tunstall coding is used in speech compression, image coding, and situations requiring fixed-rate output. It's particularly valuable in hardware implementations and real-time systems where variable-length codes would complicate timing.