A security compliance task was due.

Certain columns in a running service were the target for encryption. They came in two shapes: column values that were themselves sensitive, and JSON-stored columns where only specific fields needed encryption. This was no greenfield system; traffic was already flowing.

The work split into two halves. One was building the encryption module. The other was applying that module to a running service. The second turned out to be larger.

Encryption Strategy

I picked symmetric AES-256-GCM. Key management followed an envelope encryption structure — a CMK encrypts a DEK, and the DEK encrypts the data. The two-tier structure limits the blast radius of a key leak and simplifies key rotation. The mechanics live in a separate tech post.

For key storage, a managed secret store won out. A managed KMS and a system configuration store were the alternatives, but for operating costs and the specific role of storing DEKs, the secret store fit best. An early design review surfaced that the system configuration store is meant for system configuration, not key storage.

DEK Granularity — From Row to Table

The initial design used per-row DEKs. Each row received its own DEK, stored alongside the row. A key leak would stay scoped to that single row.

An early design review pushed back: the operational and maintenance complexity was climbing too high. After re-examining the trade-off, I moved to per-table DEKs.

Per-row keys grow with row count — every new row means another key issuance call and additional storage. In production, the impact runs deeper than raw cost: key API call frequency, backup/restore throughput, per-row key issuance logic during migration. The whole system gets heavier.

Per-table keys widen the blast radius to a single table, but operations simplify. Separating keys per sensitivity tier narrows the blast radius by a different criterion.

The answer I first judged “safer” wavered once operations entered the picture. The weight of a decision lands only after both sides of the trade-off come into view.

Internal Module — Two Patterns

The two data shapes called for two different processing paths.

The first: replace the entire column value with a single ciphertext. Applied when the column itself is sensitive.

The second: replace only the relevant field values inside a JSON object with ciphertext. The object structure and non-sensitive fields remain plaintext.

Without bundling both patterns into one module, callers split into two paths. The module exposes both as first-class APIs.

Migration — Three-Stage, Zero-Downtime

A running service rules out a single-shot column swap. I split it into three stages.

  1. Prepare. Add encrypted columns via DDL. Apply dual writes in code — INSERTs and UPDATEs hit both plaintext and ciphertext columns, while SELECTs decrypt the encrypted column if present and fall back to plaintext otherwise.
  2. Migrate. Bulk-encrypt the existing plaintext rows into the new columns. Run dry-run first to confirm scope and timing, then execute with a tuned batch size.
  3. Clean up. Verify the new columns are fully populated, then drop the plaintext columns and remove the fallback branches.

Each stage gates on the previous PR being merged and deployed. Stage N+1’s code assumes stage N is already live everywhere.

The WHERE Clause and HMAC

A constraint surfaced mid-migration.

Some columns were used in WHERE conditions — lookup queries, deduplication checks. Encrypting them outright breaks those queries. AES-GCM produces a different ciphertext for the same plaintext on every encryption, so WHERE email = '...' equality comparisons stop being meaningful.

Preserving searchability required a deterministic transformation. I added an HMAC column alongside the encrypted one. On write, the original value gets hashed once and encrypted once — two stored representations. Lookups go through the HMAC column; full value recovery goes through the ciphertext column.

This constraint never showed up in the column survey. Column names and types say nothing about how a column actually gets used in queries. The codebase had to be read directly to surface it.

Spreading Across the Org — A Migration Automation Skill

A working module is not the end. The target columns spread across many services, and someone had to write the three-stage migration for each one.

People repeating the same procedure leak mistakes. Issue tracker tickets came in inconsistent shapes, making column-info parsing fragile, and migration script dry-runs varied person to person.

I built an automation Skill so any engineer could run the same procedure end to end. It parses column information from standardized metadata on the issue tracker ticket, generates a migration script matched to the module’s API, and walks through dry-run inspection before live execution.

The issue tracker ticket format got standardized alongside. Server, database, table, column name, type, sensitive fields — a defined table layout, with the Skill prompting for missing fields when the description falls short.

The earlier hackathon experience of reaching for AI tools — back then, just to ship fast — shifted here toward shaping a standard organizational procedure.

Takeaways

Applying the module weighed more than building it. Envelope encryption, the two patterns, the three-stage migration all started as standard patterns, but got reshaped in production. I moved DEK granularity from row to table for operational cost. I added an HMAC companion column to keep both AES-GCM confidentiality and search. I built a Skill to standardize the same procedure across multiple services. That process was where the actual work lived.

A design does not settle in one pass. Row-level moved to table-level, both patterns turned out to be required, and HMAC entered mid-flight. The answers I first judged correct kept getting reshaped as operations pressed back.

In the end, making the module usable mattered as much as building it. The Skill filled that gap. Security compliance was the immediate goal, but what stayed behind was an organizational standard for handling sensitive data.

References