Late last year, a conversation started about improving the ad Fallback’s performance by introducing a CTR prediction model.

Fallback kicks in when the primary ad system decides there’s no ad to serve. Its purpose is to raise fill rate — the ratio of impressions to ad slots.

I was a backend engineer. I had no background in AI.

The expectation was that wiring the surrounding systems would take more work than the model itself, so the project landed on my plate.

Model Choice: Logistic Regression

The model was Logistic Regression.

Since the goal was improving ad CTR, we just needed to learn whether a given impression would be clicked — a binary classification problem.

LR and LightGBM are commonly used in ad platforms. But this was an initial version, and I didn’t want to take on complex tuning and operational burden from day one.

So I picked the simpler option: LR.

Language and Framework Choice

I went with Python and sklearn. For both the training batch and the inference server.

I initially considered ONNX + Go. A new project felt like a place where I could start with Go. For inference, pushing the model through ONNX would give me framework independence and better performance.

But the internal ML operating environment was Python-centric. The reference examples, the shareable code, the deployment patterns — all in Python. When you need advice and reviews, the same language felt like the right call. I set aside the performance angle and chose continuity of operations.

The framework choice followed similar logic. I knew ONNX has better inference performance than sklearn, but for a lightweight model like LR, that gain wouldn’t move the needle. sklearn felt enough for training and saving, and forcing a heavy pipeline onto a light model seemed like overengineering to me.

ML Lifecycle Architecture

I divided the ML Lifecycle into three components.

  • Training batch: Periodically trains the LR model and pushes the trained model to the model store.
  • Model store: Built on MLflow. Keeps versioned copies of models written by the training batch.
  • Inference server: Loads the latest model from the store and serves real-time predictions.
flowchart LR
    A["Training batch"] -->|"① push model
② move champion alias"| B["Model store
(MLflow)"] A -->|"③ trigger deployment"| C["Inference server"] B -.->|"④ load champion on pod startup"| C

The flow is simple: training batch → model store → inference server. The three components connect only through model files, and the training schedule runs independently from inference.

Inside the Training Batch: The Promotion Gate

The training batch wasn’t just “train → save.” Once training finished, the model had to pass through a Promotion Gate — a quality check — before the champion alias would move.

flowchart LR
    A["Data loading"] --> B["Preprocessing"] --> C["Training"] --> D["Evaluation"] --> E{"Promotion Gate"}
    E -->|"PASS"| F["Update champion alias
+ trigger rollout"] E -->|"FAIL"| G["Keep current champion"]

The criteria were simple. If the trained model’s evaluation metrics crossed the predefined thresholds, it passed; otherwise, it failed. On pass, the champion alias moved to the new version and a rollout was triggered. On failure, the new model was only logged to the registry while the current champion kept serving traffic.

This meant a degraded model couldn’t accidentally reach production — without any code changes.

Deployment

For getting new models into the inference server, I used k8s-based rolling deployment.

MLflow’s alias feature lets you tag a model version with a name like “champion” to point at the current production model. When the training batch passes the Promotion Gate, it moves the champion alias to the new version and triggers a deployment. Inference server pods are replaced one at a time, and each new pod loads whichever model has the champion alias on startup before entering the service.

Looking Back

The LR + sklearn + MLflow combination was simple, and it ran light and fast.

What I regret most was choosing Python + sklearn. As features grew, inference cost climbed and the resources required grew with it. If we had gone with ONNX + Go and used multiple cores inside a single process, the same load could probably have been handled with fewer resources. At the time, I judged that continuity of operations was the right call — but the cost of that decision showed up in the operational phase.

Starting out, my biggest worry was “can I do this without an AI background?” By the end, I found that what I needed was a bit different. It wasn’t ML algorithms or infrastructure expertise — what mattered more was how precisely I understood the domain, and being able to judge which features to combine and how. Reading data and spotting patterns — that analysis skill — turned out to be just as important.

References