[GUIDE] What Is Base64 and How Does It Work?
A practical, from-first-principles explainer. Learn the 6-bit grouping trick behind Base64, why the output is 33% larger, where the = padding comes from, and when to use it.
// THE ONE-SENTENCE DEFINITION
Base64 is a text representation of binary data that uses only 64 printable ASCII characters: A–Z, a–z, 0–9, and two symbols (typically + and /, with = as a padding marker). That is the entire idea. Every three bytes of binary are re-grouped into four Base64 characters, and nothing else changes.
It is an encoding, not an encryption — anyone can reverse it in milliseconds. The purpose is not privacy; it is safe transport. Base64 lets you put binary payloads (images, certificates, PDF files, cryptographic keys) into places that only accept text: HTML attributes, JSON strings, URLs, email bodies, YAML configs, database TEXT columns, environment variables.
// WHY WE NEED IT AT ALL
Text-only channels strip or mutate bytes they don't recognise. An email gateway that understands only 7-bit ASCII will break the high bit of every non-English character. A JSON parser will reject embedded null bytes. A URL that contains a literal space, quote, or ampersand is invalid until those characters are percent-encoded. The 1990s SMTP standards assumed mail bodies would always be plain English text — but people wanted to send photos and spreadsheets too.
Base64 solves this by restricting the output to 64 characters that survive every known text channel intact: 26 uppercase letters, 26 lowercase letters, 10 digits, and two symbols that are (a) printable, (b) not used by common escape mechanisms, and (c) distinct enough to round-trip through ASCII. Plus a padding character =, which is also safe.
// THE 6-BIT GROUPING TRICK
Here is the mechanical heart of Base64 in one line: take the input as a bitstream, chop it into 6-bit pieces, and look up each piece in a 64-character alphabet.
Why 6 bits? Because 2^6 = 64, so each 6-bit chunk maps exactly to one character of the alphabet. Why not 7 or 8 bits? 7 bits gives 128 values (too many — not all are printable), and 8 bits is just raw binary again. 6 bits is the sweet spot where every possible value has a human-readable character.
Three input bytes = 24 bits = exactly four 6-bit groups = four Base64 characters. That 3-in-4-out ratio is fixed; it is the reason Base64 output is ~4/3 the size of the input, i.e. about 33% larger.
// Worked example: encoding the 3 bytes 'Man' (77 97 110)
// ASCII: M a n
// Binary: 01001101 01100001 01101110
// Re-group into 6-bit chunks:
// 010011 010110 000101 101110
// Decimal: 19 22 5 46
// Base64: T W F u
// Result: 'TWFu' (stored as 4 ASCII bytes: 84 87 70 117)
// THE BASE64 ALPHABET
RFC 4648 §4 defines the standard alphabet. Position 0–25 are A–Z, 26–51 are a–z, 52–61 are 0–9, position 62 is +, and position 63 is /.
The URL-safe variant (RFC 4648 §5, also known as base64url) swaps + for - and / for _, so the output can be safely dropped into URLs and filenames without further escaping. Both variants decode identically if you apply the corresponding alphabet.
// Standard Base64 alphabet (index → character)
// 0–25: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
// 26–51: a b c d e f g h i j k l m n o p q r s t u v w x y z
// 52–61: 0 1 2 3 4 5 6 7 8 9
// 62: + (or - in URL-safe)
// 63: / (or _ in URL-safe)
// pad: =
// WHERE THE = PADDING COMES FROM
The algorithm expects input sizes that are multiples of 3 bytes (so they group evenly into 4 Base64 characters). When the input length is not a multiple of 3, one or two bytes are left over, which is not enough to fill the final 4-character output group. Padding with = indicates how many bytes were missing.
• Input length mod 3 == 0 → no padding (e.g., 'Man' → 'TWFu').
• Input length mod 3 == 1 → two pad characters (e.g., 'M' → 'TQ==').
• Input length mod 3 == 2 → one pad character (e.g., 'Ma' → 'TWE=').
// Why 'M' becomes 'TQ==':
// Byte: M = 01001101 (8 bits)
// Padded to 12 bits with zeros: 010011 010000
// Decimal: 19, 16 → 'T', 'Q'
// Output length must be multiple of 4 → add '==' padding
// Result: 'TQ=='
//
// When decoding, the decoder strips '==' and recovers the first 8 bits,
// discarding the trailing zero bits.
// IS THE 33% OVERHEAD EXACT?
Close, but not quite. The exact formula is ceil(n / 3) × 4 characters for a byte input of length n. For a 100 KB input, that's ceil(102400 / 3) × 4 = 136,534 characters — 33.3% more than 100 KB. In practice, once you add HTTP-level gzip or Brotli compression, the over-the-wire cost is closer to 10–15%, because Base64 text compresses reasonably well (though not as well as raw binary).
For small payloads the padding matters: encoding 1 byte costs 4 characters (four times the size), and encoding 2 bytes also costs 4 characters. So tiny Base64 strings look surprisingly long compared to their inputs.
// WHERE BASE64 IS USED IN THE REAL WORLD
-
>
Data URIs in HTML/CSS — embed icons and logos inline:
<img src="data:image/png;base64,…"> - > JSON and GraphQL payloads — APIs that need to ship binary (file uploads, thumbnails) inside a text protocol
- > JWT tokens — the three dot-separated parts of a JWT are base64url strings
- > HTTP Basic Auth — username:password is Base64-encoded in the Authorization header (still not secure without TLS!)
- > Email (MIME) — attachments are Base64-encoded to survive legacy 7-bit SMTP servers
- > PEM-format certificates and keys — the -----BEGIN CERTIFICATE----- block wraps a Base64-encoded DER blob
- > SSH keys — id_rsa.pub and authorized_keys lines are Base64-encoded public keys
- > Database TEXT columns — when you can't use a BLOB type, Base64 lets you store binary as text
- > Environment variables — Kubernetes Secrets values are Base64-encoded (though NOT encrypted)
- > QR codes and magic links — short Base64 tokens that are safe to embed in URLs
// BASE64 IS NOT ENCRYPTION
This is the single most common misconception. Base64 does not hide the contents of your data — it rewrites binary as text using a public, well-documented mapping. A password Base64-encoded as cGFzc3dvcmQ= is exactly as vulnerable as the plaintext password; anyone can decode it in one function call. Treat Base64 the way you would treat hexadecimal: it is a representation, not protection.
If you need privacy, apply a real cryptographic primitive (AES-GCM, ChaCha20-Poly1305, age, PGP) to the plaintext first, and then Base64-encode the ciphertext if you need to transport it through a text channel. The Base64 step is the last mile, not the security layer.
Kubernetes Secrets are the canonical pitfall: Kubernetes stores secret values Base64-encoded, which makes them look vaguely protected. They are not — anyone with read access to the namespace can reverse them. True secret protection requires tools like SealedSecrets, Vault, SOPS, or cloud-native KMS integrations.
// A QUICK ENCODE / DECODE IN EVERY LANGUAGE
// JavaScript (browser):
// btoa('Hello') → 'SGVsbG8='
// atob('SGVsbG8=') → 'Hello'
// (btoa/atob only handle Latin-1; use TextEncoder for Unicode)
//
// Node.js:
// Buffer.from('Hello').toString('base64') → 'SGVsbG8='
// Buffer.from('SGVsbG8=', 'base64').toString() → 'Hello'
//
// Python:
// import base64
// base64.b64encode(b'Hello') → b'SGVsbG8='
// base64.b64decode('SGVsbG8=') → b'Hello'
//
// Go:
// base64.StdEncoding.EncodeToString([]byte("Hello")) → "SGVsbG8="
//
// Ruby:
// Base64.strict_encode64('Hello') → 'SGVsbG8='
//
// Shell:
// echo -n 'Hello' | base64 → 'SGVsbG8='
// echo 'SGVsbG8=' | base64 -d → 'Hello'
// COMMON MISTAKES TO AVOID
-
>
Encoding a URL-safe string with the standard alphabet — the receiver's decoder will reject the
+and/. Use base64url when the output goes into a URL, cookie, or filename. - > Double-encoding — passing already-Base64 text through the encoder again. Always check whether the data is already Base64 before encoding.
-
>
Forgetting UTF-8 —
btoa('héllo')throws in the browser becauseéis outside Latin-1. UseTextEncoderfirst to turn the string into bytes. -
>
Padding-stripping mismatches — base64url often omits the
=padding. If your decoder is strict, re-add padding:input + '==='.slice((input.length + 3) % 4). - > Assuming Base64 is secure — it is not. Encryption is separate. Always.
- > Using Base64 for huge binaries — encoding a 500 MB file into a single string will exhaust memory. Stream it instead.
-
>
Line-wrapped vs unwrapped — MIME/PEM wrap Base64 at 64 or 76 columns with
\n. Most other contexts expect a single line. Strip or insert newlines as appropriate.
// BASE64 IN 30 SECONDS — THE CHEAT SHEET
- > 64 printable ASCII characters (A–Z, a–z, 0–9, +, /) plus = padding
- > 3 bytes in → 4 characters out (fixed ratio)
- > Output is ~33% larger than input
- > Reversible: it's encoding, not encryption
- > Use base64url (replace + → -, / → _) for URLs and filenames
- > Pad short inputs with 1 or 2 = characters so the output length is a multiple of 4
- > For Unicode text: encode the string to UTF-8 bytes first, then Base64 those bytes
- > Decodable by every mainstream language's standard library
- > Safe in HTML, JSON, URLs, emails, and PEM files
- > Not safe as a security layer — always pair with real encryption for sensitive data
// NEXT STEPS
Now that you know the mechanics, try our Base64 encoder or Base64 decoder and inspect the output character-by-character. For URL-safe payloads, switch to base64url. For images, see image → Base64.
Related deep-dives:
• Base64 URL-safe vs standard — when to use each alphabet
• Base64 for UTF-8 and Unicode — avoiding the btoa() pitfall
• Base64 in JavaScript & Node.js — atob, btoa, Buffer
• Base64 vs Base64URL vs Base32 — comparison table