Microservices Architecture

When a monolith grows, microservices come to mind. Builds slow down, a change in one part blocks deployment of another, and traffic to one place propagates through the whole. The conventional prescription is “make it smaller.” The harder problem is deciding where to cut.

Microservices is a decision about which criterion to use to decompose the system. Domain boundary, data ownership, scale pattern, failure isolation — whichever you anchor to creates the service boundaries, and those boundaries decide communication, data consistency, and operational cost in turn. Pick the wrong criterion and downstream decisions go off, and the boundary, once drawn, is hard to undo.

flowchart LR
  A[Decomposition Criteria
Domain · Data · Scale · Failure] --> B[Service Boundary]
  B --> C[Communication
Sync / Async]
  B --> D[Data Consistency]
  C --> E[Operational Cost]
  D --> E

Decomposition Criteria

Splitting a service is a question of which difference becomes the boundary. The same code shaped by domain looks one way, by data ownership another, by scale pattern yet another. There is no single right criterion. The question is which criterion is dominant in this system.

Domain Boundary

DDD’s Bounded Context aligns directly. Business-meaning units like “Order,” “Payment,” and “Recommendation” align with the service. Change cohesion is good. When order logic changes, only the order service changes.

The weakness shows when the domain is unclear. Draw boundaries before the model has settled and the wrong model becomes the service boundary. One domain ends up split across two services, or two domains fuse into one. From then on, most changes touch both services.

This is the vertical-split decision inside a single codebase, extended to the service boundary.

Data Ownership

Who owns the write authority over which tables. The moment you allow a shared DB, you lose the core benefit of MSA: independent deployment and independent schema evolution. A schema change in one service can break another service’s code.

This criterion often coincides with the domain boundary. A domain owns its data as a matter of course. But within the same domain, when write patterns differ, data ownership becomes a separate criterion. Within an order domain, transactional order writes and analytics aggregates carry different load and consistency demands.

Scale Pattern

Split by workload characteristics. Within the same domain, splits fit when read-vs-write, CPU-vs-IO, or bursty-vs-steady patterns diverge.

Take a chat workload: message publishing is write-heavy and bursty, while message search is read-heavy and IO-bound. Bundle them into one service and tuning for either pattern leaves the other inefficient. Split them and each scales the way that suits it. Publishing is absorbed by a queue, and search is served by index plus cache.

Failure Boundary

Split so a failure in one service can’t reach another. Separate the critical path from the non-critical path.

Take an ad-serving workload: when the main recommender fails, fallback content has to keep flowing or revenue takes a hit. Put both in one service and the main’s failure halts the fallback as well. Split them and the fallback survives on its own path. This is the decomposition criterion from the stability perspective.

When Criteria Conflict

The domain boundary often wants one split while the scale pattern wants another. When a single domain holds both bursty and steady workloads, domain unity and scale separation collide.

Pick the dominant criterion for this system, split along it, and handle the remaining criteria through internal modules or queues within a service. Pulling every criterion up to the service boundary explodes service count and operational cost beyond control.

Service Communication

Once service boundaries are set, the next decision is how they talk. Synchronous or asynchronous.

Synchronous — gRPC

When the call is imperative and needs an immediate response. Payment requests, auth checks. The caller waits for the result and learns of failure right away.

gRPC defines bidirectional contracts via ProtoBuf, an IDL, over HTTP/2. It supports four modes: unary, server streaming, client streaming, and bidirectional streaming. ProtoBuf’s binary serialization is lighter than JSON. HTTP/2’s multiplexing resolves the head-of-line blocking that limited HTTP/1.1.

The cost of sync is cumulative latency along long call chains, and the fact that one failure propagates through the chain. Sync communication is best kept to short chains and confined to the critical path.

Asynchronous — Kafka

When the system needs event publication, eventual consistency, or traffic absorption. An order is created and the recommendation service consumes the event to update its model; user activity logs accumulate in a queue and an analytics service processes them at its own pace.

Kafka is a distributed log. A producer writes events to a topic, and consumers read from their own offsets. Multiple consumers can read the same event for different purposes (fan-out). Bursts get absorbed by the queue, flattening the load downstream.

The cost of async is consistency that is not immediate. A brief gap exists before the event arrives, and lag accumulates if a consumer halts or slows.

Which Path

The decomposition criterion answers it.

Domain boundary split, but two domains depend on each other’s immediate result → sync.
Data ownership split where one service’s change must update another’s cache or view → async events.
Scale pattern split with bursts to absorb → async queue.
Failure isolation separating critical and non-critical → sync on the critical path, async on the non-critical path to enable fallback.

Most real systems mix both. A system that insists on a single communication style is usually reading only one decomposition criterion.

Data Consistency and Operational Tools

Loss of the Single Transaction

The single ACID transaction familiar from the monolith is gone. “Wrap payment and inventory deduction in one transaction” is not natural across service boundaries.

Distributed transactions take two directions. Attempt them synchronously (2PC, TCC), or accept eventual consistency and design compensating transactions (Saga, Outbox). Neither restores the simplicity of the single transaction.

When drawing service boundaries, be aware of where the single transaction breaks. Splitting the most-frequently-co-changing data across boundaries turns every write into a distributed transaction, and the cost is hard to dismiss. The decomposition criterion decides not just communication but the data-consistency model as well.

Where Operational Tools Become Necessary

As service count grows, new tools become necessary in specific places.

Service Mesh: when communication policy (retry, timeout, circuit breaker, mTLS) needs to live outside application code
API Gateway / BFF: when auth, rate limiting, and response composition belong concentrated at the external entry point
Distributed tracing: when call chains grow long enough that locating slowness in a request becomes hard
Container orchestration: when service count grows enough to require automated deployment and scaling

These tools follow as consequences, not as prerequisites. With clear decomposition criteria and simple boundaries, tool adoption can be deferred. Adopt tools first while the criteria are unclear, and complexity accumulates without ever being made visible by the tools.

The Cost of Wrong Decomposition

A split is hard to reverse. Once code lives in a separate service, it gets its own data, its own deployment pipeline, its own monitoring, its own team dependencies. Merging it back means unwinding all of that.

Systems with the wrong decomposition criterion typically show two signals. When most changes require simultaneous deployment of multiple services, the domain boundary was drawn wrong. When most calls extend into a long sync chain, communication was decided by inertia rather than by the criterion.

The same principle applies to split decisions inside a single codebase, but MSA leaves those decisions in a form that is hard to reverse.

So when in doubt, I argue for not cutting. Draw module boundaries inside the monolith first and let those boundaries settle, then cut. The right time is when the domain has firmed up, ownership is clear, and scale differences make operations hard. Once cutting becomes the goal, the decomposition criterion turns into post-hoc justification.

References

Horizontal vs Vertical Slicing — Horizontal/vertical splits within a single codebase. The MSA domain criterion is the service-boundary version of the same decision.
HTTP/1.1 and HTTP/2 — HTTP/2 multiplexing. The transport layer where gRPC’s synchronous communication model lives.
Kafka Fundamentals and KRaft Mode — Kafka producer/consumer mechanics and KRaft mode. The infrastructure for asynchronous communication.
Distributed Transactions — 2PC, TCC, Saga, Outbox. The patterns chosen where the single transaction breaks.

Decomposition Criteria#

Domain Boundary#

Data Ownership#

Scale Pattern#

Failure Boundary#

When Criteria Conflict#

Service Communication#

Synchronous — gRPC#

Asynchronous — Kafka#

Which Path#

Data Consistency and Operational Tools#

Loss of the Single Transaction#

Where Operational Tools Become Necessary#

The Cost of Wrong Decomposition#

References#