Why Low Bitrate Video Gets Blocky

2026-03-02 · video compression

You're watching a video on shaky WiFi. The image suddenly turns into a grid of blurry squares. Faces become unrecognizable. Movement creates trails of blocky artifacts.

Then the connection improves. The blocks vanish. Everything is sharp again.

What just happened? The video codec ran out of bits. But what does that actually mean? And why do the artifacts look like blocks specifically?

The Scale of the Problem

Uncompressed 1080p video at 60fps requires about 3 gigabits per second. That's 375 megabytes every second. A 2-hour movie would be 2.7 terabytes.

Netflix streams 4K video at about 15 megabits per second. That's a 200:1 compression ratio. They're throwing away 99.5% of the data, and the result still looks good.

How? By being incredibly smart about what to throw away.

Two Kinds of Redundancy

Video compression exploits two facts about video:

Spatial redundancy: Most pixels are similar to their neighbors. A blue sky isn't random noise — it's mostly the same blue, with gradual variations. We can compress within a single frame.

Temporal redundancy: Most pixels are similar between frames. If frame 1 shows a person at position X, and frame 2 shows the same person slightly to the right, we don't need to store the person twice — just the movement.

Let's see how codecs exploit each one.

Part 1: Spatial Compression (Within a Frame)

The first tool is the same one JPEG uses: block-based compression.

Each frame is divided into blocks — typically 16×16 pixels, called macroblocks. Each block is compressed independently using a mathematical transform (Discrete Cosine Transform, or DCT) that converts pixel values into frequency components.

Here's the key insight: human vision is more sensitive to low-frequency changes (broad shapes, gradients) than high-frequency changes (fine details, sharp edges). So we keep the low frequencies precisely and approximate (or discard) the high frequencies.

Interactive: Bitrate vs Quality

Drag the slider to simulate different bitrates. Notice how blocks appear first in complex areas.

Original
High Bitrate
Low High
Quality: 100%

Lower the bitrate to see block artifacts emerge. High-frequency details disappear first.

When bitrate drops, the codec has to quantize more aggressively. Each block's frequency components get rounded to fewer distinct values. Eventually, entire blocks become single colors — the average of what was there.

This is why the artifacts are blocks. Not random pixels, not stripes, but squares. The compression itself works on square units, so the errors appear as squares.

Why Some Scenes Get Blocky Faster

Not all video compresses equally. A talking head against a wall? Easy. An explosion with debris flying everywhere? Hard.

The difference is motion complexity. More movement = more information to encode = more bits needed. If you force a low bitrate on complex motion, the codec has no choice but to discard more data, creating more visible blocks.

Part 2: Temporal Compression (Between Frames)

Here's where video compression gets clever. If frame 1 shows a red ball at position (100, 100), and frame 2 shows the same ball at position (105, 100), we don't need to store the ball twice.

We just store: "the ball moved 5 pixels right."

This is motion estimation, and it's the heart of modern video compression.

The Three Frame Types

Video codecs use three types of frames, each with different compression levels:

I-Frame Intra-frame — A complete image, compressed like a JPEG. No references to other frames. Large (maybe 100KB), but can be decoded independently.

P-Frame Predicted frame — Stores only the difference from a previous frame. Uses motion vectors to say "this block came from over there." Much smaller (maybe 20KB).

B-Frame Bi-directional frame — References both previous AND future frames. Can say "this block is halfway between that block in frame A and that block in frame B." Smallest of all (maybe 5KB).

The GOP: Group of Pictures

A typical GOP structure. Size of box indicates relative data size.

I
100%
B
5%
B
5%
P
20%
B
5%
B
5%
P
20%
...

A 12-frame GOP: 1 I-frame + 2 P-frames + 9 B-frames

This pattern — I B B P B B P B B P B B — repeats. Each group starts with an I-frame and ends just before the next one. This is called a GOP (Group of Pictures).

Motion Vectors: Tracking Movement

When encoding a P-frame or B-frame, the codec searches for each block in the reference frame(s). If it finds a match nearby, it records a motion vector: "this 16×16 block moved from (x, y) to (x+5, y+2)."

The magic is that we don't transmit the block's pixels — just the vector and a tiny "residual" (the difference between predicted and actual).

Interactive: Motion Vectors
Motion vectors shown as arrows

Each arrow shows where a block moved from the previous frame. Long arrows = fast motion. No arrow = static.

Static backgrounds? Zero vectors. Smooth pans? Uniform vectors. Chaotic motion? Vectors everywhere, and more bits needed.

When It All Goes Wrong

Now we can understand exactly what happens when bitrate is too low:

  1. Quantization increases — Each block's DCT coefficients get rounded more aggressively. Fine details disappear first.
  2. Blocks become uniform — Eventually each block is a single color. Edges between blocks become visible.
  3. Motion vectors fail — With coarse blocks, the codec can't find good matches. Motion prediction breaks down.
  4. Error propagation — P-frames reference corrupted I-frames. B-frames reference corrupted P-frames. Errors compound across the GOP.

When you see that blocky mess on bad WiFi, you're seeing the codec's emergency fallback: "I don't have enough bits to represent this accurately, so here's my best approximation using large, coarse blocks."

The Decoding Order Puzzle

Here's a detail that trips people up: B-frames reference future frames. But you can't decode a frame you haven't received yet.

Solution: transmit frames in a different order than they're displayed.

Display Order vs Decode Order
Display Order
I B B P B B P
1 2 3 4 5 6 7
Decode Order
I P P B B B B
1 4 7 2 3 5 6

The decoder needs I and P frames first, then can decode the B frames that reference them.

This is why video players buffer — they're reordering frames behind the scenes.

Real-World Impact

Understanding this helps explain practical behaviors:

Why fast-forward is sometimes blocky: Skipping ahead means jumping to the nearest I-frame. If you land mid-GOP, you're missing reference frames. The codec does its best with partial information.

Why scene cuts cause momentary quality drops: A scene change breaks temporal continuity. Motion vectors become useless. The encoder must suddenly send a lot more data — effectively an I-frame mid-GOP — or accept artifacts.

Why sports need higher bitrate than news: Fast, unpredictable motion = hard to predict = more data needed. A static news anchor compresses easily. A football game does not.

The Numbers

Typical Bitrates (1080p)
Content Bitrate Quality
Video call 1-2 Mbps Acceptable with artifacts
YouTube (1080p) 4-8 Mbps Good
Netflix (1080p) 5-8 Mbps Very good
Blu-ray (1080p) 20-40 Mbps Excellent
Uncompressed ~3000 Mbps Perfect

Modern Improvements

Newer codecs (H.265/HEVC, AV1) use the same principles but get better results through:

  • Larger blocks: Up to 64×64 pixels, allowing better prediction for smooth areas
  • More reference frames: Not just previous/next, but any of 8+ nearby frames
  • Better motion prediction: Sub-pixel accuracy, more search patterns
  • Smarter block division: Blocks can split into smaller units where needed

The result: same quality at half the bitrate. Or twice the quality at the same bitrate.

The Elegance

What I find beautiful about video compression is how it mirrors how we perceive. We don't notice every pixel — we notice shapes, movement, changes. Video codecs exploit exactly this.

They're not storing reality. They're storing a perceptual shortcut — just enough information to reconstruct an image that looks right to human eyes. The blocks aren't a bug; they're the limit of that approximation.

When your video artifacts, you're seeing the codec's assumptions break down. The model it built — "this block moved here, that block stayed put" — no longer has enough bits to hide its approximations.

The Core Insight

Video compression doesn't store images.
It stores predictions and corrections.

Most of the time, the predictions are good enough
that the corrections can be tiny.

When they're not — that's when you see the blocks.

Further reading: AWS: Video Compression Mechanisms · DCT on Wikipedia · Bitmovin: Compression Fundamentals