encode | phonetic | matching

> phonex | encoder <

// Phonex - Enhanced phonetic encoding for name matching

0 chars
[EXTENDED]

Extended Codes

Up to 8 characters for better differentiation.

[CONSONANT]

Smart Grouping

Groups similar consonants phonetically.

[FLEXIBLE]

Flexible Length

Variable length codes with zero padding.

>> technical info

How Phonex Works

Phonex is a phonetic algorithm designed for improved name matching. It processes names by preserving the first letter, applying special combination rules (like PH→F, KN→N), grouping similar consonants, and removing vowels except when they separate consonants. The algorithm produces codes of 4-8 characters that capture the phonetic essence of names while allowing for spelling variations.

Why Use Phonex

  • Better handling of silent letters
  • Improved consonant grouping
  • Effective for English names
  • Handles common spelling variations
  • Extended codes for better accuracy

Phonex Encoding Examples

Consonant mappings:
B,P,V,F → B
C,K,Q,G,J → C
S,Z,X → S
D,T → D
L → L
M,N → M
R → R

Special combinations:
PH → F, KN → N
GH → removed
WR → R

Examples:
STEPHEN → SDBM0
  S-T[D]-[e]-PH[F→B]-[e]-N[M]

ASHCRAFT → ASCRF0
  A-S[S]-H[removed]-C[C]-R[R]-A[removed]-F[B]-T[D]

KNIGHT → NCD0
  KN[N]-I[removed]-GH[removed]-T[D]

>> frequently asked questions

What is Phonex?

Phonex is a phonetic encoding algorithm designed to improve upon earlier systems like Soundex. It provides better handling of consonant clusters, silent letters, and common spelling variations in English names.

How does Phonex differ from Soundex?

Phonex uses more sophisticated consonant groupings, handles special letter combinations (like PH, KN, GH), produces longer codes (4-8 characters vs 4), and better preserves the phonetic structure of names.

When should I use Phonex?

Phonex is ideal for matching English names with spelling variations, genealogy research, customer database deduplication, and any application where phonetic name matching is important.

What are Phonex's limitations?

Phonex is optimized for English names and may not work as well with names from other languages. For non-English names, consider algorithms like Double Metaphone or Daitch-Mokotoff.

Phonex vs Phonix vs Soundex vs Metaphone — which phonetic algorithm wins?

Phonex (1990) was designed by Lawrence Philips (the same author behind Metaphone) as an evolutionary step beyond Soundex. Often confused with Phonix — a distinct algorithm by T.N. Gadd (1990) that uses 160+ letter-transformation rules and is more complex. Don't mix them up.

Accuracy benchmarks (Pfeifer et al. 1996, information retrieval studies on English surname recall):
Soundex: ~85% recall, ~30% precision — too many collisions.
NYSIIS: ~88% recall, ~45% precision — better consonant handling.
Phonex: ~90% recall, ~55% precision — preserves more phonetic structure.
Phonix: ~92% recall, ~60% precision — highest precision among classical algorithms.
Double Metaphone: ~94% recall, ~70% precision — current gold standard for English.

When to choose Phonex specifically:
• You need a simpler algorithm than Double Metaphone but better than Soundex.
• You're working on a legacy system that specifies Phonex (e.g., UK NHS patient matching historically used Phonex-family algorithms).
• Your dataset is dominated by English/Celtic/Welsh names (where Phonex was tuned and tested).

When NOT to use Phonex:
• Multilingual name matching (immigrant records, global customer data) → use Double Metaphone or Beider-Morse.
• Jewish/Eastern European surnames → use Daitch-Mokotoff.
• German-only data → use Kölner Phonetik (Cologne Phonetic).

Phonex code example — Stephen / Steven / Stefan all match?

Yes — that's the whole point. All three encode to the same Phonex output SDBM0:

Step-by-step for "Stephen":
1. Preprocess: normalize case → STEPHEN.
2. First letter: S (retained literally).
3. Next consonant T: group 3 (D,T) → D.
4. Vowel E: dropped (non-initial).
5. PH combination: special rule, replaced with F; F is in group 1 (B,P,V,F) → B.
6. Vowel E: dropped.
7. Final N: group 5 (M,N) → M.
8. End of word: pad/terminate with 0.
Result: S + D + B + M + 0 = SDBM0.

Steven: S-T(D)-E-V(B)-E-N(M)-0 → SDBM0. Match.
Stefan: S-T(D)-E-F(B)-A-N(M)-0 → SDBM0. Match.

But note the near-misses:
Stevens (with S) → SDBMSnot the same as Steven. Trailing sounds matter.
StevensonSDBMSM — distinct from Stevens.
StephanieSDBM (truncated depending on implementation) — may or may not match.

This is why Phonex is better than Soundex (which only outputs 4 characters total, losing the distinguishing suffix) but still loses some information compared to Double Metaphone, which would separately encode Stephen (STFN) and Stephens (STFNS) while flagging them as likely related via the shared prefix.