Multiple services share a cache and periodically fetch configuration data. With full refresh, the entire dataset is transmitted every cycle regardless of whether anything changed. The less frequently the data changes, the greater the waste.
Refreshing only changed items reduces network throughput to be proportional to the actual change rate.
Full Refresh
Full refresh is simple to implement. Every cycle, fetch all data and replace local state. No change detection logic needed.
But network throughput under this approach is data size × consumer count × refresh frequency. Whether the data actually changed is irrelevant. As consumers or data size grows, throughput scales linearly.
A situation can arise where CPU and memory have headroom but network throughput hits the limit. Scaling up the instance to resolve this wastes CPU and memory capacity.
Data Separation
Before applying incremental refresh, data must be separated by update frequency.
Configuration data. Entity metadata, conditions, and rules change only when an administrator modifies them. Update frequency is low.
Real-time data. Counters and consumption metrics update with every request. They must always reflect the latest state. These are not candidates for incremental refresh.
Apply incremental refresh only to configuration data. Real-time data continues refreshing every cycle.
Deduplication
When an entity contains sub-entities that are also referenced by other entities, duplication can occur. Managing sub-entities separately reduces both storage and transmission volume.
Change Detection Strategies
Data Comparison
A batch job fetches data from the source (database) and directly compares it against what is stored in the cache. Only items with different content are written.
The advantage is accuracy. Source-cache mismatches are never missed. The disadvantage is the additional read cost of fetching existing cache data for comparison.
Timestamp-Based
Record the change time for each item. Storing timestamps as Sorted Set scores enables range queries for items changed after a specific point in time.
flowchart LR
subgraph Write ["Write"]
BATCH["Batch"] --> UPD["Update changed items"]
UPD --> TS["Record timestamp
in Sorted Set"]
end
subgraph Read ["Read"]
SVC["Service"] --> RANGE["Range query:
changes since last refresh"]
RANGE --> CHANGED["Changed item IDs"]
CHANGED --> GET["Fetch those items only"]
end
When the reader remembers its last query time, it can fetch only items changed since then. Full scans become range queries, and throughput scales with the number of changes rather than total data size.
Both strategies can be combined. The write path uses data comparison to detect changes and records timestamps. The read path uses timestamp range queries to fetch changes.
Write Path / Read Path
Clear separation of write and read paths is essential in incremental refresh.
The write path is handled by a batch job. It fetches data from the source, compares against the cache, writes only changes, and records change timestamps.
The read path is handled by services. They query only items changed since their last refresh and partially update local state. When nothing has changed, local data is retained as-is.
Different consumers may need different data scopes. Some need only configuration. Others need configuration plus content. Others need real-time data as well. Separating read interfaces per consumer lets each service fetch only what it needs.
Use Cases
This pattern applies frequently to architectures where configuration data is periodically fetched from a shared cache.
Ad campaign configuration. Campaign metadata and targeting conditions change infrequently but are queried simultaneously by multiple servers. Switching from full to incremental refresh significantly reduces network throughput.
Product catalogs. Product information changes only on creation or modification. Refreshing only changed products instead of transmitting thousands every cycle is more efficient.
User permissions/settings. Permission changes are infrequent but referenced by many services. Incremental refresh fits this structure well.
Trade-offs
Incremental refresh adds complexity compared to full refresh.
Change detection logic, timestamp management, and partial local state updates are all required. A mechanism for full synchronization to recover from cache-source inconsistencies should also be considered.
When data is small or changes frequently, full refresh is simpler and sufficient. Switching to incremental refresh is appropriate when network cost has become the actual bottleneck.