> inspect | decode | analyze <

// Inspect Unicode code points, encodings, and character details

[INSPECT]

Character Analysis

Analyze any text character by character. View code points, decimal values, and Unicode block information.

[ENCODE]

UTF-8/UTF-16 View

See the exact byte representation in UTF-8 and UTF-16 encoding for every character.

[FREE]

Block Detection

Automatically detect and display the Unicode block for each character, from Basic Latin to CJK and emoji.

// ABOUT UNICODE

How Unicode Works:

Unicode is a universal character encoding standard that assigns a unique code point (U+0000 to U+10FFFF) to every character. The Basic Multilingual Plane (BMP, U+0000 to U+FFFF) covers most common characters. Supplementary planes (U+10000+) include emoji, historic scripts, and rare CJK ideographs. UTF-8 uses 1-4 bytes per character, while UTF-16 uses 2 or 4 bytes (surrogate pairs for supplementary characters).

Example:

"A" → U+0041, Decimal 65, UTF-8: 41, UTF-16: 0041

Common Use Cases:

  • >Debug encoding issues in multilingual text
  • >Inspect invisible or confusable characters
  • >Verify correct UTF-8/UTF-16 byte sequences
  • >Look up Unicode code points and blocks
  • >Analyze emoji and special symbol encoding

>> frequently asked questions

Q: What is Unicode?

A: Unicode is a universal character encoding standard maintained by the Unicode Consortium. It assigns a unique number (code point) to every character from every writing system, including Latin, Chinese, Arabic, emoji, and symbols. The latest version defines over 149,000 characters.

Q: What is a code point vs a character?

A: A code point is the numeric value assigned to a character in Unicode (e.g., U+0041 for 'A'). A character is the abstract concept (letter, digit, symbol). Some visible characters are composed of multiple code points (e.g., flag emoji use two regional indicator code points).

Q: What is the difference between UTF-8 and UTF-16?

A: UTF-8 uses 1-4 bytes per character and is backward compatible with ASCII. UTF-16 uses 2 or 4 bytes and is used internally by JavaScript and Windows. UTF-8 is more compact for Latin text, while UTF-16 is more compact for CJK text.

Q: What is the Basic Multilingual Plane (BMP)?

A: The BMP is the first Unicode plane (U+0000 to U+FFFF) containing the most commonly used characters including Latin, Greek, Cyrillic, CJK ideographs, and common symbols. Characters outside the BMP (U+10000+) require surrogate pairs in UTF-16.

Q: How many Unicode characters are there?

A: Unicode can theoretically represent 1,114,112 code points (U+0000 to U+10FFFF). As of Unicode 15.1, over 149,000 characters are assigned, covering 161 scripts. New characters are added with each version.

// OTHER LANGUAGES