HTTP/1.1 processes requests and responses sequentially. One connection handles one request at a time. When multiple resources are needed simultaneously, multiple connections must be opened. HTTP/2 solved this limitation by processing multiple requests in parallel over a single connection.
HTTP/1.1 Limitations
HOL Blocking
HTTP/1.1 processes request-response pairs sequentially on a single TCP connection. The second request waits until the first response arrives. This is HOL blocking — Head-of-Line Blocking.
sequenceDiagram
participant C as Client
participant S as Server
Note over C,S: HTTP/1.1 — Sequential
C->>S: GET /style.css
S-->>C: style.css response
C->>S: GET /script.js
S-->>C: script.js response
C->>S: GET /image.png
S-->>C: image.png response
Rendering a single web page requires dozens of resources. Sequential processing is slow.
HTTP/1.1 introduced pipelining as a workaround — sending multiple requests without waiting for responses. But responses must still arrive in request order. A slow first response delays everything behind it. Implementation issues led most browsers to disable pipelining.
In practice, concurrent TCP connections are the real workaround. Browsers open up to 6 simultaneous TCP connections per domain for parallel requests. But each connection incurs TCP handshake and TLS handshake costs.
Header Redundancy
HTTP/1.1 transmits headers as text. Headers like User-Agent, Cookie, and Accept repeat with every request. With cookies included, request headers alone can reach several KB. No compression mechanism exists, so identical information transmits every time.
HTTP/2
HTTP/2 was standardized in 2015, based on Google’s SPDY protocol. It maintains the same HTTP semantics while fundamentally changing the transport mechanism.
Binary Framing
HTTP/1.1 is text-based — requests and responses are human-readable strings. HTTP/2 encodes all messages as binary frames.
block-beta
columns 3
block:http1["HTTP/1.1"]:3
h1["GET /index.html HTTP/1.1\nHost: example.com\nAccept: text/html"]
end
space:3
block:http2["HTTP/2"]:3
h2a["HEADERS frame\n(stream 1)"]
h2b["DATA frame\n(stream 1)"]
h2c["HEADERS frame\n(stream 3)"]
end
style http1 fill:#FFCDD2
style http2 fill:#C8E6C9
A single HTTP message splits into HEADERS frames and DATA frames. Each frame carries a stream ID tag, allowing the receiver to reassemble frames into the correct messages.
Multiplexing
The core improvement in HTTP/2. Multiple streams operate simultaneously over a single TCP connection. Each stream is an independent request-response pair. Frames interleave at the frame level, so a slow stream does not block others.
sequenceDiagram
participant C as Client
participant S as Server
Note over C,S: HTTP/2 — Multiplexing
C->>S: Stream 1: GET /style.css
C->>S: Stream 3: GET /script.js
C->>S: Stream 5: GET /image.png
S-->>C: Stream 3: script.js (partial)
S-->>C: Stream 1: style.css (complete)
S-->>C: Stream 5: image.png (partial)
S-->>C: Stream 3: script.js (complete)
S-->>C: Stream 5: image.png (complete)
No need to open multiple TCP connections as in HTTP/1.1. A single connection handles all requests. TCP handshake and TLS handshake costs occur just once.
HPACK Header Compression
HTTP/2 compresses headers with HPACK, a dedicated compression algorithm combining two techniques.
Static/dynamic tables: Frequently used header fields are replaced with index numbers. If :method: GET maps to index 2 in the static table, two bytes suffice. Headers appearing during the connection are added to the dynamic table and referenced by index in subsequent requests.
Huffman encoding: Values not in the tables are compressed using Huffman coding. High-frequency characters receive shorter bit representations, reducing overall size.
Headers that repeated at several KB per request in HTTP/1.1 shrink to tens of bytes.
Server Push
When a client requests HTML, the server can proactively send CSS and JavaScript that the HTML needs — without waiting for additional client requests.
sequenceDiagram
participant C as Client
participant S as Server
C->>S: GET /index.html
S-->>C: PUSH_PROMISE: /style.css
S-->>C: PUSH_PROMISE: /script.js
S-->>C: /index.html response
S-->>C: /style.css response (pushed)
S-->>C: /script.js response (pushed)
This eliminates the round-trip time for the client to parse HTML and request additional resources. In practice, pushing cached resources unnecessarily limits its usefulness.
Stream Prioritization
Clients can assign weights and dependencies to each stream. Setting a CSS file’s priority higher than images causes the server to transmit CSS first. This improves initial load time by prioritizing render-critical resources.
HTTP/2 Limitations
HTTP/2 solved HTTP-level HOL blocking but TCP-level HOL blocking remains. All streams operate over a single TCP connection, so when a TCP packet is lost, all streams wait until that packet is retransmitted.
HTTP/3 addresses this by using the UDP-based QUIC protocol instead of TCP. Each stream becomes an independent transport unit, so packet loss in one stream does not affect others.
gRPC
gRPC is an RPC framework developed by Google. It uses HTTP/2 as the transport protocol and Protocol Buffers as the serialization format.
HTTP/2 Utilization
gRPC leverages HTTP/2 multiplexing to process multiple RPC calls in parallel over a single connection. HPACK header compression applies as well. Extending HTTP/2’s streaming capability, gRPC supports four communication patterns.
block-beta
columns 2
block:unary["Unary"]:1
columns 3
u1["Client"]
space
u2["Server"]
u1 -- "1 req / 1 res" --> u2
end
block:server["Server Streaming"]:1
columns 3
s1["Client"]
space
s2["Server"]
s1 -- "1 req / N res" --> s2
end
block:client["Client Streaming"]:1
columns 3
c1["Client"]
space
c2["Server"]
c1 -- "N req / 1 res" --> c2
end
block:bidi["Bidirectional Streaming"]:1
columns 3
b1["Client"]
space
b2["Server"]
b1 -- "N req / N res" --> b2
end
style unary fill:#E3F2FD
style server fill:#E8F5E9
style client fill:#FFF3E0
style bidi fill:#F3E5F5
Protocol Buffers
Unlike REST’s JSON, gRPC uses Protocol Buffers, Protobuf. Define message structures and service interfaces in a .proto file, and code for each language is auto-generated.
service ChatService {
rpc SendMessage (MessageRequest) returns (MessageResponse);
rpc StreamMessages (RoomRequest) returns (stream Message);
}
message MessageRequest {
string room_id = 1;
string user_id = 2;
string content = 3;
}
Protobuf uses binary serialization. Messages are 3–5x smaller than JSON. Serialization and deserialization run 5–10x faster. The schema in the .proto file enforces the interface contract between client and server at the code level.
REST vs gRPC
gRPC fits when:
- Internal microservice communication: low latency, high throughput needed
- Bidirectional streaming: real-time chat, event subscriptions
- Polyglot environments: one
.protofile generates client/server code for multiple languages
REST fits when:
- External APIs: browsers do not support gRPC directly — gRPC-Web or a proxy is required
- Debugging convenience: JSON is human-readable, inspectable via curl or browser dev tools
- Simple CRUD: resource-based APIs that do not need RPC-style interfaces
In practice, combining gRPC for internal service communication and REST for external APIs is a common pattern.
HTTP/1.1 processes requests and responses sequentially. HTTP/2 removed this constraint with multiplexing. gRPC adds binary serialization and streaming on top of HTTP/2’s advantages. Each protocol addresses different problems, and the workload determines the choice.