Ad System Outage Retrospective — A Shared Dependency and a Single Point of Failure

An external event drove ad traffic far above normal, triggering a cascading failure. The real problem was that the filtering component was a single point of failure — and the fallback sat on top of it too, so one collapse pulled both down at once. The fix took three paths: removing the fallback’s dependency (independence), adding rate limiting to the component itself (protection), and reconsidering the runtime (throughput).

February 13, 2025 · 7 min read

Kubernetes Fundamentals

Container orchestration basics and what backend developers need to know: core objects, networking, scaling with HPA, and operational essentials.

April 10, 2024 · 7 min read