Ad System Outage Retrospective — A Shared Dependency and a Single Point of Failure
An external event drove ad traffic far above normal, triggering a cascading failure. The real problem was that the filtering component was a single point of failure — and the fallback sat on top of it too, so one collapse pulled both down at once. The fix took three paths: removing the fallback’s dependency (independence), adding rate limiting to the component itself (protection), and reconsidering the runtime (throughput).