What Is Base64 and How Does It Work?

// THE ONE-SENTENCE DEFINITION

Base64 is a text representation of binary data that uses only 64 printable ASCII characters: A–Z, a–z, 0–9, and two symbols (typically + and /, with = as a padding marker). That is the entire idea. Every three bytes of binary are re-grouped into four Base64 characters, and nothing else changes.

It is an encoding, not an encryption — anyone can reverse it in milliseconds. The purpose is not privacy; it is safe transport. Base64 lets you put binary payloads (images, certificates, PDF files, cryptographic keys) into places that only accept text: HTML attributes, JSON strings, URLs, email bodies, YAML configs, database TEXT columns, environment variables.

// WHY WE NEED IT AT ALL

Text-only channels strip or mutate bytes they don't recognise. An email gateway that understands only 7-bit ASCII will break the high bit of every non-English character. A JSON parser will reject embedded null bytes. A URL that contains a literal space, quote, or ampersand is invalid until those characters are percent-encoded. The 1990s SMTP standards assumed mail bodies would always be plain English text — but people wanted to send photos and spreadsheets too.

Base64 solves this by restricting the output to 64 characters that survive every known text channel intact: 26 uppercase letters, 26 lowercase letters, 10 digits, and two symbols that are (a) printable, (b) not used by common escape mechanisms, and (c) distinct enough to round-trip through ASCII. Plus a padding character =, which is also safe.

// THE 6-BIT GROUPING TRICK

Here is the mechanical heart of Base64 in one line: take the input as a bitstream, chop it into 6-bit pieces, and look up each piece in a 64-character alphabet.

Why 6 bits? Because 2^6 = 64, so each 6-bit chunk maps exactly to one character of the alphabet. Why not 7 or 8 bits? 7 bits gives 128 values (too many — not all are printable), and 8 bits is just raw binary again. 6 bits is the sweet spot where every possible value has a human-readable character.

Three input bytes = 24 bits = exactly four 6-bit groups = four Base64 characters. That 3-in-4-out ratio is fixed; it is the reason Base64 output is ~4/3 the size of the input, i.e. about 33% larger.

// Worked example: encoding the 3 bytes 'Man' (77 97 110)
// ASCII:  M        a        n
// Binary: 01001101 01100001 01101110
// Re-group into 6-bit chunks:
//         010011 010110 000101 101110
// Decimal: 19     22     5      46
// Base64:  T      W      F      u
// Result:  'TWFu'  (stored as 4 ASCII bytes: 84 87 70 117)

// THE BASE64 ALPHABET

RFC 4648 §4 defines the standard alphabet. Position 0–25 are A–Z, 26–51 are a–z, 52–61 are 0–9, position 62 is +, and position 63 is /.

The URL-safe variant (RFC 4648 §5, also known as base64url) swaps + for - and / for _, so the output can be safely dropped into URLs and filenames without further escaping. Both variants decode identically if you apply the corresponding alphabet.

// Standard Base64 alphabet (index → character)
// 0–25:  A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
// 26–51: a b c d e f g h i j k l m n o p q r s t u v w x y z
// 52–61: 0 1 2 3 4 5 6 7 8 9
// 62:    +       (or  -  in URL-safe)
// 63:    /       (or  _  in URL-safe)
// pad:   =

// WHERE THE = PADDING COMES FROM

The algorithm expects input sizes that are multiples of 3 bytes (so they group evenly into 4 Base64 characters). When the input length is not a multiple of 3, one or two bytes are left over, which is not enough to fill the final 4-character output group. Padding with = indicates how many bytes were missing.

• Input length mod 3 == 0 → no padding (e.g., 'Man' → 'TWFu').
• Input length mod 3 == 1 → two pad characters (e.g., 'M' → 'TQ==').
• Input length mod 3 == 2 → one pad character (e.g., 'Ma' → 'TWE=').

// Why 'M' becomes 'TQ==':
// Byte: M = 01001101                   (8 bits)
// Padded to 12 bits with zeros: 010011 010000
// Decimal: 19, 16  →  'T', 'Q'
// Output length must be multiple of 4 → add '==' padding
// Result: 'TQ=='
//
// When decoding, the decoder strips '==' and recovers the first 8 bits,
// discarding the trailing zero bits.

// IS THE 33% OVERHEAD EXACT?

Close, but not quite. The exact formula is ceil(n / 3) × 4 characters for a byte input of length n. For a 100 KB input, that's ceil(102400 / 3) × 4 = 136,534 characters — 33.3% more than 100 KB. In practice, once you add HTTP-level gzip or Brotli compression, the over-the-wire cost is closer to 10–15%, because Base64 text compresses reasonably well (though not as well as raw binary).

For small payloads the padding matters: encoding 1 byte costs 4 characters (four times the size), and encoding 2 bytes also costs 4 characters. So tiny Base64 strings look surprisingly long compared to their inputs.

// WHERE BASE64 IS USED IN THE REAL WORLD

> Data URIs in HTML/CSS — embed icons and logos inline: <img src="data:image/png;base64,…">
> JSON and GraphQL payloads — APIs that need to ship binary (file uploads, thumbnails) inside a text protocol
> JWT tokens — the three dot-separated parts of a JWT are base64url strings
> HTTP Basic Auth — username:password is Base64-encoded in the Authorization header (still not secure without TLS!)
> Email (MIME) — attachments are Base64-encoded to survive legacy 7-bit SMTP servers
> PEM-format certificates and keys — the -----BEGIN CERTIFICATE----- block wraps a Base64-encoded DER blob
> SSH keys — id_rsa.pub and authorized_keys lines are Base64-encoded public keys
> Database TEXT columns — when you can't use a BLOB type, Base64 lets you store binary as text
> Environment variables — Kubernetes Secrets values are Base64-encoded (though NOT encrypted)
> QR codes and magic links — short Base64 tokens that are safe to embed in URLs

// BASE64 IS NOT ENCRYPTION

This is the single most common misconception. Base64 does not hide the contents of your data — it rewrites binary as text using a public, well-documented mapping. A password Base64-encoded as cGFzc3dvcmQ= is exactly as vulnerable as the plaintext password; anyone can decode it in one function call. Treat Base64 the way you would treat hexadecimal: it is a representation, not protection.

If you need privacy, apply a real cryptographic primitive (AES-GCM, ChaCha20-Poly1305, age, PGP) to the plaintext first, and then Base64-encode the ciphertext if you need to transport it through a text channel. The Base64 step is the last mile, not the security layer.

Kubernetes Secrets are the canonical pitfall: Kubernetes stores secret values Base64-encoded, which makes them look vaguely protected. They are not — anyone with read access to the namespace can reverse them. True secret protection requires tools like SealedSecrets, Vault, SOPS, or cloud-native KMS integrations.

// A QUICK ENCODE / DECODE IN EVERY LANGUAGE

// JavaScript (browser):
//   btoa('Hello')            → 'SGVsbG8='
//   atob('SGVsbG8=')         → 'Hello'
//   (btoa/atob only handle Latin-1; use TextEncoder for Unicode)
//
// Node.js:
//   Buffer.from('Hello').toString('base64')     → 'SGVsbG8='
//   Buffer.from('SGVsbG8=', 'base64').toString() → 'Hello'
//
// Python:
//   import base64
//   base64.b64encode(b'Hello')       → b'SGVsbG8='
//   base64.b64decode('SGVsbG8=')     → b'Hello'
//
// Go:
//   base64.StdEncoding.EncodeToString([]byte("Hello")) → "SGVsbG8="
//
// Ruby:
//   Base64.strict_encode64('Hello')   → 'SGVsbG8='
//
// Shell:
//   echo -n 'Hello' | base64          → 'SGVsbG8='
//   echo 'SGVsbG8=' | base64 -d        → 'Hello'

// COMMON MISTAKES TO AVOID

> Encoding a URL-safe string with the standard alphabet — the receiver's decoder will reject the + and /. Use base64url when the output goes into a URL, cookie, or filename.
> Double-encoding — passing already-Base64 text through the encoder again. Always check whether the data is already Base64 before encoding.
> Forgetting UTF-8 — btoa('héllo') throws in the browser because é is outside Latin-1. Use TextEncoder first to turn the string into bytes.
> Padding-stripping mismatches — base64url often omits the = padding. If your decoder is strict, re-add padding: input + '==='.slice((input.length + 3) % 4).
> Assuming Base64 is secure — it is not. Encryption is separate. Always.
> Using Base64 for huge binaries — encoding a 500 MB file into a single string will exhaust memory. Stream it instead.
> Line-wrapped vs unwrapped — MIME/PEM wrap Base64 at 64 or 76 columns with \n. Most other contexts expect a single line. Strip or insert newlines as appropriate.

// BASE64 IN 30 SECONDS — THE CHEAT SHEET

> 64 printable ASCII characters (A–Z, a–z, 0–9, +, /) plus = padding
> 3 bytes in → 4 characters out (fixed ratio)
> Output is ~33% larger than input
> Reversible: it's encoding, not encryption
> Use base64url (replace + → -, / → _) for URLs and filenames
> Pad short inputs with 1 or 2 = characters so the output length is a multiple of 4
> For Unicode text: encode the string to UTF-8 bytes first, then Base64 those bytes
> Decodable by every mainstream language's standard library
> Safe in HTML, JSON, URLs, emails, and PEM files
> Not safe as a security layer — always pair with real encryption for sensitive data

// NEXT STEPS

Now that you know the mechanics, try our Base64 encoder or Base64 decoder and inspect the output character-by-character. For URL-safe payloads, switch to base64url. For images, see image → Base64.

Related deep-dives:
• Base64 URL-safe vs standard — when to use each alphabet
• Base64 for UTF-8 and Unicode — avoiding the btoa() pitfall
• Base64 in JavaScript & Node.js — atob, btoa, Buffer
• Base64 vs Base64URL vs Base32 — comparison table

[GUIDE] What Is Base64 and How Does It Work?