Implementing Multi-Level Caching with Redis and Local Cache

Context and Background

The translation platform’s core workflow involves looking up previously translated segments (translation memory) to suggest matches for new content. Each document being translated generates hundreds of segment lookups against the translation memory database. At peak load with enterprise batch jobs, the PostgreSQL database was processing upwards of 15,000 translation memory queries per minute, and response times were degrading.

We had already optimized the database layer with proper indexing and query tuning, but the fundamental problem was that the same translation memory segments were being fetched repeatedly. A customer translating a 200-page technical manual would hit the same glossary terms and recurring phrases thousands of times. The data was highly cacheable — translation memories are append-mostly, and glossary entries change perhaps once a week.

Our initial approach with Redis-only caching improved things significantly, cutting database load by about 70%. But profiling revealed that the Redis network round-trips were now the bottleneck for the translation memory lookup endpoint. Each lookup took 1-2ms to Redis, and with 200+ lookups per job, this added 200-400ms of pure network overhead per translation job. We needed a cache that could serve the hottest data without any network hop.

Implementation

Cache abstraction layer: Built a custom MultiLevelCacheManager implementing Spring’s CacheManager interface. This allowed existing @Cacheable annotations to work transparently with the two-level cache. The manager delegates to a MultiLevelCache that checks Caffeine first, then Redis, then the database.
L1 configuration (Caffeine): Configured Caffeine with per-cache-name settings. Translation memory cache: max 10,000 entries, 30-second TTL. Glossary cache: max 5,000 entries, 5-minute TTL. User preferences: max 2,000 entries, 60-second TTL. Used recordStats() for hit rate monitoring via Micrometer.
L2 configuration (Redis): Used Spring Data Redis with Lettuce client in cluster mode. Translation memory entries cached with 10-minute TTL. Glossary entries with 30-minute TTL. Serialization via Kryo for compact binary representation, reducing Redis memory usage by roughly 40% compared to JSON serialization.
Write-through with L1 invalidation: On cache writes, data flows to Redis first (source of truth for cache layer), then populates the local Caffeine cache on the writing instance. Other instances pick up changes on their next L1 miss, which falls through to Redis. For critical invalidations (glossary updates), we publish a Redis Pub/Sub message that triggers L1 eviction across all instances.
Cache warming on startup: Implemented a CacheWarmer component that pre-loads the top 1,000 most-accessed translation memory segments and active glossary entries into both L1 and L2 during service startup. This eliminated the cold-start penalty after deployments.
Monitoring dashboard: Exposed cache metrics via Micrometer to CloudWatch: L1 hit rate, L2 hit rate, overall hit rate, eviction counts, and cache size. Set up alerts for L1 hit rate dropping below 60% or overall hit rate below 85%.

Results

Overall cache hit rate stabilized at approximately 92%, with L1 (Caffeine) resolving about 75% of all lookups and L2 (Redis) handling another 17%
Translation memory lookup endpoint p95 latency dropped from ~45ms to ~3ms for cached segments
PostgreSQL query volume for translation memory decreased by roughly 90%, freeing database capacity for write-heavy operations
Redis connection count dropped by approximately 65% since most reads are now served from local cache
Translation job end-to-end processing time improved by about 15% on average, with larger improvements on jobs with repetitive content
Cache warming eliminated the post-deployment latency spike that previously lasted 2-3 minutes while caches repopulated organically
Memory overhead of the L1 cache was modest at roughly 150MB per instance, well within the headroom of our ECS task definitions

Context

Decision

Alternatives Considered

Redis-only caching

Local cache only (Caffeine/Guava)

Two-level cache: Caffeine L1 + Redis L2

Reasoning

Context and Background

Implementation

Results