HTTP/1.1 processes requests and responses sequentially. One connection handles one request at a time. When multiple resources are needed simultaneously, multiple connections must be opened. HTTP/2 solved this limitation by processing multiple requests in parallel over a single connection.

HTTP/1.1 Limitations

HOL Blocking

HTTP/1.1 processes request-response pairs sequentially on a single TCP connection. The second request waits until the first response arrives. This is HOL blocking — Head-of-Line Blocking.

sequenceDiagram
    participant C as Client
    participant S as Server

    Note over C,S: HTTP/1.1 — Sequential
    C->>S: GET /style.css
    S-->>C: style.css response
    C->>S: GET /script.js
    S-->>C: script.js response
    C->>S: GET /image.png
    S-->>C: image.png response

Rendering a single web page requires dozens of resources. Sequential processing is slow.

HTTP/1.1 introduced pipelining as a workaround — sending multiple requests without waiting for responses. But responses must still arrive in request order. A slow first response delays everything behind it. Implementation issues led most browsers to disable pipelining.

In practice, concurrent TCP connections are the real workaround. Browsers open up to 6 simultaneous TCP connections per domain for parallel requests. But each connection incurs TCP handshake and TLS handshake costs.

Header Redundancy

HTTP/1.1 transmits headers as text. Headers like User-Agent, Cookie, and Accept repeat with every request. With cookies included, request headers alone can reach several KB. No compression mechanism exists, so identical information transmits every time.

HTTP/2

HTTP/2 was standardized in 2015, based on Google’s SPDY protocol. It maintains the same HTTP semantics while fundamentally changing the transport mechanism.

Binary Framing

HTTP/1.1 is text-based — requests and responses are human-readable strings. HTTP/2 encodes all messages as binary frames.

block-beta
    columns 3
    block:http1["HTTP/1.1"]:3
        h1["GET /index.html HTTP/1.1\nHost: example.com\nAccept: text/html"]
    end
    space:3
    block:http2["HTTP/2"]:3
        h2a["HEADERS frame\n(stream 1)"]
        h2b["DATA frame\n(stream 1)"]
        h2c["HEADERS frame\n(stream 3)"]
    end

    style http1 fill:#FFCDD2
    style http2 fill:#C8E6C9

A single HTTP message splits into HEADERS frames and DATA frames. Each frame carries a stream ID tag, allowing the receiver to reassemble frames into the correct messages.

Multiplexing

The core improvement in HTTP/2. Multiple streams operate simultaneously over a single TCP connection. Each stream is an independent request-response pair. Frames interleave at the frame level, so a slow stream does not block others.

sequenceDiagram
    participant C as Client
    participant S as Server

    Note over C,S: HTTP/2 — Multiplexing
    C->>S: Stream 1: GET /style.css
    C->>S: Stream 3: GET /script.js
    C->>S: Stream 5: GET /image.png
    S-->>C: Stream 3: script.js (partial)
    S-->>C: Stream 1: style.css (complete)
    S-->>C: Stream 5: image.png (partial)
    S-->>C: Stream 3: script.js (complete)
    S-->>C: Stream 5: image.png (complete)

No need to open multiple TCP connections as in HTTP/1.1. A single connection handles all requests. TCP handshake and TLS handshake costs occur just once.

HPACK Header Compression

HTTP/2 compresses headers with HPACK, a dedicated compression algorithm combining two techniques.

Static/dynamic tables: Frequently used header fields are replaced with index numbers. If :method: GET maps to index 2 in the static table, two bytes suffice. Headers appearing during the connection are added to the dynamic table and referenced by index in subsequent requests.

Huffman encoding: Values not in the tables are compressed using Huffman coding. High-frequency characters receive shorter bit representations, reducing overall size.

Headers that repeated at several KB per request in HTTP/1.1 shrink to tens of bytes.

Server Push

When a client requests HTML, the server can proactively send CSS and JavaScript that the HTML needs — without waiting for additional client requests.

sequenceDiagram
    participant C as Client
    participant S as Server

    C->>S: GET /index.html
    S-->>C: PUSH_PROMISE: /style.css
    S-->>C: PUSH_PROMISE: /script.js
    S-->>C: /index.html response
    S-->>C: /style.css response (pushed)
    S-->>C: /script.js response (pushed)

This eliminates the round-trip time for the client to parse HTML and request additional resources. In practice, pushing cached resources unnecessarily limits its usefulness.

Stream Prioritization

Clients can assign weights and dependencies to each stream. Setting a CSS file’s priority higher than images causes the server to transmit CSS first. This improves initial load time by prioritizing render-critical resources.

HTTP/2 Limitations

HTTP/2 solved HTTP-level HOL blocking but TCP-level HOL blocking remains. All streams operate over a single TCP connection, so when a TCP packet is lost, all streams wait until that packet is retransmitted.

HTTP/3 addresses this by using the UDP-based QUIC protocol instead of TCP. Each stream becomes an independent transport unit, so packet loss in one stream does not affect others.

gRPC

gRPC is an RPC framework developed by Google. It uses HTTP/2 as the transport protocol and Protocol Buffers as the serialization format.

HTTP/2 Utilization

gRPC leverages HTTP/2 multiplexing to process multiple RPC calls in parallel over a single connection. HPACK header compression applies as well. Extending HTTP/2’s streaming capability, gRPC supports four communication patterns.

block-beta
    columns 2

    block:unary["Unary"]:1
        columns 3
        u1["Client"]
        space
        u2["Server"]
        u1 -- "1 req / 1 res" --> u2
    end

    block:server["Server Streaming"]:1
        columns 3
        s1["Client"]
        space
        s2["Server"]
        s1 -- "1 req / N res" --> s2
    end

    block:client["Client Streaming"]:1
        columns 3
        c1["Client"]
        space
        c2["Server"]
        c1 -- "N req / 1 res" --> c2
    end

    block:bidi["Bidirectional Streaming"]:1
        columns 3
        b1["Client"]
        space
        b2["Server"]
        b1 -- "N req / N res" --> b2
    end

    style unary fill:#E3F2FD
    style server fill:#E8F5E9
    style client fill:#FFF3E0
    style bidi fill:#F3E5F5

Protocol Buffers

Unlike REST’s JSON, gRPC uses Protocol Buffers, Protobuf. Define message structures and service interfaces in a .proto file, and code for each language is auto-generated.

service ChatService {
  rpc SendMessage (MessageRequest) returns (MessageResponse);
  rpc StreamMessages (RoomRequest) returns (stream Message);
}

message MessageRequest {
  string room_id = 1;
  string user_id = 2;
  string content = 3;
}

Protobuf uses binary serialization. Messages are 3–5x smaller than JSON. Serialization and deserialization run 5–10x faster. The schema in the .proto file enforces the interface contract between client and server at the code level.

REST vs gRPC

gRPC fits when:

  • Internal microservice communication: low latency, high throughput needed
  • Bidirectional streaming: real-time chat, event subscriptions
  • Polyglot environments: one .proto file generates client/server code for multiple languages

REST fits when:

  • External APIs: browsers do not support gRPC directly — gRPC-Web or a proxy is required
  • Debugging convenience: JSON is human-readable, inspectable via curl or browser dev tools
  • Simple CRUD: resource-based APIs that do not need RPC-style interfaces

In practice, combining gRPC for internal service communication and REST for external APIs is a common pattern.

HTTP/1.1 processes requests and responses sequentially. HTTP/2 removed this constraint with multiplexing. gRPC adds binary serialization and streaming on top of HTTP/2’s advantages. Each protocol addresses different problems, and the workload determines the choice.