// D-M Soundex - Enhanced phonetic encoding for Jewish and Eastern European names
Generates multiple codes for ambiguous pronunciations.
Consistent 6-digit numeric codes for all names.
Optimized for Yiddish and Hebrew name patterns.
Daitch-Mokotoff Soundex, created in 1985 by Gary Mokotoff and Randy Daitch, is an enhancement of American Soundex designed specifically for Jewish and Eastern European surnames. Unlike traditional Soundex which produces a single code, D-M can generate multiple codes to account for different possible pronunciations, especially important for names transliterated from Hebrew, Yiddish, Polish, Russian, and German.
Jewish surname variations:
Cohen variants:
Cohen � 560000
Cohn � 560000
Kohn � 560000
Kahn � 560000
Kagan � 556000
Moskowitz variants:
Moskowitz � 645740
Moscowitz � 645740
Moskovitz � 645740
Moskovich � 645740
Multiple codes example:
Auerbach � [097500, 097400]
AU � 0 or 7
Results in two codes
Key features:
- CH � 5 or 4 (varies)
- CK � 5 or 45
- Initial vowels � 0
- DZ, DZH, DZS � 4
- TSH, TZH � 4
Daitch-Mokotoff Soundex is a phonetic encoding system created in 1985 specifically for Jewish and Eastern European surnames. It improves upon American Soundex by better handling the spelling variations common in names transliterated from Hebrew, Yiddish, Polish, Russian, and German.
D-M Soundex generates multiple codes because many letter combinations can be pronounced differently depending on the language of origin. For example, 'CH' might be pronounced as in German 'Bach' or English 'Chair'. Multiple codes ensure matches regardless of the original pronunciation.
D-M Soundex uses 6-digit numeric codes (vs 4-character alphanumeric), handles many more letter combinations, generates multiple codes for ambiguous cases, and is specifically tuned for Eastern European and Jewish name patterns that American Soundex handles poorly.
It's widely used in Jewish genealogy databases, Holocaust memorial projects, immigration records, cemetery records, and any system dealing with Jewish or Eastern European names where spelling variations are common due to transliteration.
The D-M Soundex was explicitly created for the Avotaynu Jewish genealogy journal (1985, Randy Daitch + Gary Mokotoff) and became the backbone of JewishGen, the world's largest Jewish genealogy research site.
Typical workflow:
1. Calculate D-M code for your ancestor's name — e.g., for Moszkowicz (Polish spelling), the tool outputs 645740.
2. Search JewishGen's JGFF (Family Finder) or JRI-Poland by Soundex code rather than exact spelling.
3. Result: all variants encoding to 645740 surface — Moskowitz, Moscowitz, Moszkowicz, Moskovich, Moskowicz, Moshkowitz, Moskvitz.
4. Cross-reference with primary sources: ship manifests (Ellis Island), census, shtetl records, Yizkor books.
Databases that use D-M Soundex as their search backbone:
• JewishGen.org — All Poland Database, All Romania Database, Hungary SIG, etc.
• JRI-Poland (Jewish Records Indexing – Poland) — 6+ million records.
• Yad Vashem Central Database of Shoah Victims' Names — 4.8+ million entries, D-M fuzzy matching.
• Avotaynu Consolidated Jewish Surname Index — Mokotoff's own compilation.
• Ancestry.com and MyHeritage support D-M searches in their European records.
Why 6 digits? Fixed length allows numerical sorting, range queries, and efficient B-tree indexing in databases. Compared to American Soundex (4 chars), D-M's 6 digits carry ~100× more distinguishing information — critical when you have 50,000 Cohens in a database.
D-M Soundex uniquely outputs multiple codes per name when letter combinations can be pronounced in more than one way depending on the name's language of origin. This is the algorithm's killer feature.
Example — Auerbach:
• AU at the start can be pronounced:
— "OW" (German: Ow-erbach) → D-M code digit 0.
— "AY" (Yiddish: Ay-erbach) → D-M code digit 7.
• D-M outputs BOTH codes: 097500 and 097400.
• Searching either code retrieves records where the clerk wrote the name based on how they heard it.
Other classic multi-code cases:
• CH → 5 (English "ch-air") or 4 (German "Bach" / Yiddish "ch-ayim").
• CK → 5 (single K sound) or 45 (TS-K sound in some dialects).
• J → 1 (English "John") or 4 (German/Yiddish "Yacov/Yankel").
• SCH → 4 (single "SH" sound) always — unambiguous.
• TSCH / TZSCH → 4.
Concrete search impact: An ancestor whose actual surname was Yiddish Yakobson might appear in records as Jacobson, Yakobson, Jakobson, Jacobsohn, Yakobsohn — all due to clerks in NYC, Vienna, Warsaw writing what they heard. D-M encodes Yakobson to both 147660 and 447660 (the J-alternate), and both codes match Jakobson's encoding as well. Net result: single D-M search retrieves the entire surname cluster where Soundex (and even Double Metaphone) would fragment it across three or four codes.
D-M is more intricate than Soundex (it has a state machine of letter-pair rules + a fork for multi-code outputs), so use a library:
Python:pip install pyphonetics
# or pip install abydos (larger, includes D-M plus 30+ other algorithms)
from abydos.phonetic import DaitchMokotoff
dm = DaitchMokotoff()
dm.encode('Moskowitz')
# Returns {'645740', '645740'} (deduplicated)
dm.encode('Auerbach')
# Returns {'097500', '097400'} — two distinct codes
JavaScript / Node.js:npm install daitch-mokotoff
const dm = require('daitch-mokotoff');
dm('Moskowitz');
// ['645740']
dm('Auerbach');
// ['097500', '097400']
PHP: Pear's Text_Soundex_Daitch_Mokotoff (legacy) or roll your own using the published rule tables.
PostgreSQL: No native D-M function. Either precompute and store the codes as a text array, or use a PL/Python function wrapping the library.
Algorithm key points for implementers:
1. Apply the longest-match rule: try 3-letter combinations first (SCH, TSCH, TZSCH), then 2-letter (CH, CK, TS), then single letters. Always greedy-longest.
2. Each letter/combo has three possible codes based on position: word-start, before-vowel, any-other.
3. Consecutive identical codes collapse: S-S → one 4.
4. Initial vowels → code 0; non-initial vowels dropped entirely (except in letter-pair contexts).
5. Pad right with zeros to exactly 6 digits. Truncate if longer.
6. Fork: if a letter/combo has an alternate code, duplicate the output-so-far and continue both branches → ends with a set of 1-4 final codes.
Test corpus: Avotaynu publishes reference D-M codes for ~700,000 Jewish surnames — compare your implementation against that gold standard.