> inspect | decode | analyze <
// Inspect Unicode code points, encodings, and character details
Character Analysis
Analyze any text character by character. View code points, decimal values, and Unicode block information.
UTF-8/UTF-16 View
See the exact byte representation in UTF-8 and UTF-16 encoding for every character.
Block Detection
Automatically detect and display the Unicode block for each character, from Basic Latin to CJK and emoji.
// ABOUT UNICODE
How Unicode Works:
Unicode is a universal character encoding standard that assigns a unique code point (U+0000 to U+10FFFF) to every character. The Basic Multilingual Plane (BMP, U+0000 to U+FFFF) covers most common characters. Supplementary planes (U+10000+) include emoji, historic scripts, and rare CJK ideographs. UTF-8 uses 1-4 bytes per character, while UTF-16 uses 2 or 4 bytes (surrogate pairs for supplementary characters).
Example:
"A" → U+0041, Decimal 65, UTF-8: 41, UTF-16: 0041
Common Use Cases:
- >Debug encoding issues in multilingual text
- >Inspect invisible or confusable characters
- >Verify correct UTF-8/UTF-16 byte sequences
- >Look up Unicode code points and blocks
- >Analyze emoji and special symbol encoding
>> frequently asked questions
Q: What is Unicode?
A: Unicode is a universal character encoding standard maintained by the Unicode Consortium. It assigns a unique number (code point) to every character from every writing system, including Latin, Chinese, Arabic, emoji, and symbols. The latest version defines over 149,000 characters.
Q: What is a code point vs a character?
A: A code point is the numeric value assigned to a character in Unicode (e.g., U+0041 for 'A'). A character is the abstract concept (letter, digit, symbol). Some visible characters are composed of multiple code points (e.g., flag emoji use two regional indicator code points).
Q: What is the difference between UTF-8 and UTF-16?
A: UTF-8 uses 1-4 bytes per character and is backward compatible with ASCII. UTF-16 uses 2 or 4 bytes and is used internally by JavaScript and Windows. UTF-8 is more compact for Latin text, while UTF-16 is more compact for CJK text.
Q: What is the Basic Multilingual Plane (BMP)?
A: The BMP is the first Unicode plane (U+0000 to U+FFFF) containing the most commonly used characters including Latin, Greek, Cyrillic, CJK ideographs, and common symbols. Characters outside the BMP (U+10000+) require surrogate pairs in UTF-16.
Q: How many Unicode characters are there?
A: Unicode can theoretically represent 1,114,112 code points (U+0000 to U+10FFFF). As of Unicode 15.1, over 149,000 characters are assigned, covering 161 scripts. New characters are added with each version.