// Statistics Canada Census Name Coding for record linkage
Used by Statistics Canada for census data.
Consistent 4-character codes for all names.
Handles French accented characters properly.
The Statistics Canada name coding algorithm is used for record linkage in census data and vital statistics. It creates a 4-character code from surnames and optionally given names. The algorithm takes the first letter of the surname plus the next two consonants (vowels and Y are removed). If the surname provides fewer than 3 characters, the first letter of the given name is used. The code handles French accented characters by converting them to their base forms.
Algorithm steps: 1. First letter of surname 2. Next 2 consonants from surname 3. Given name initial if needed 4. Pad with spaces to 4 chars Examples: SMITH � SMTH S + M + T + H MacDONALD � MCDL M + C + D + (N�L) Tremblay, Marie � TRMB T + R + M + B Lee, David � LEED L + (no consonants) + D French handling: C�t� � COTE � CT L�pine � LEPINE � LPN Short names: Lo � LO (padded) Kim, Su � KMS
Statistics Canada Name Coding is an algorithm used by the Canadian government for record linkage in census data and vital statistics. It creates a standardized 4-character code from names to facilitate matching records across different databases.
The algorithm automatically converts French accented characters (�, �, �, �, etc.) to their base forms (a, e, e, c) before processing. This ensures consistent coding regardless of whether accents are used in the original data.
The given name is only used when the surname doesn't provide enough characters for the code. If the surname has fewer than 3 usable characters after removing vowels, the first letter of the given name is added to the code.
Unlike phonetic algorithms like Soundex or Metaphone, Statistics Canada coding is not strictly phonetic. It's a simple character extraction algorithm designed for consistency and ease of implementation in government systems, particularly suited for Canadian naming patterns.