Spring Boot Microservices Patterns I Use in Production - Writing

Introduction

After building and maintaining microservices architectures across several companies, I have developed strong opinions about which patterns actually deliver value in production versus which ones add complexity without meaningful benefit. This article covers the patterns I consistently reach for when building Spring Boot microservices, along with the implementation details that documentation often glosses over.

These patterns are not theoretical. They come from running services that process millions of requests daily, experiencing cascading failures at 2 AM, and gradually learning what keeps systems stable under real-world conditions.

Circuit Breakers and Resilience with Resilience4j

The circuit breaker pattern is non-negotiable for any service that calls another service over the network. I have seen entire platforms go down because a single downstream dependency became slow and exhausted all thread pools upstream. Resilience4j is my preferred library because it composes well with Spring Boot and does not require a separate infrastructure component.

Here is how I configure circuit breakers in production. The key insight is that you need different configurations for different downstream services based on their reliability characteristics and your tolerance for degraded behavior:

@Configuration
public class ResilienceConfig {

    @Bean
    public CircuitBreakerConfig translationEngineCircuitBreakerConfig() {
        return CircuitBreakerConfig.custom()
            .failureRateThreshold(50)
            .slowCallRateThreshold(80)
            .slowCallDurationThreshold(Duration.ofSeconds(3))
            .waitDurationInOpenState(Duration.ofSeconds(30))
            .permittedNumberOfCallsInHalfOpenState(5)
            .slidingWindowType(SlidingWindowType.COUNT_BASED)
            .slidingWindowSize(20)
            .minimumNumberOfCalls(10)
            .recordExceptions(IOException.class, TimeoutException.class, ServiceUnavailableException.class)
            .ignoreExceptions(BadRequestException.class)
            .build();
    }
}

The important decisions here are: what counts as a failure (we explicitly ignore 400-level client errors because those are not downstream failures), slow call thresholds (a service responding in 5 seconds might be technically “up” but is effectively degrading your system), and half-open behavior (how many probe requests to send before deciding the circuit can close again).

I always pair circuit breakers with fallback strategies. The fallback depends on the business context:

@Service
@Slf4j
public class TranslationService {

    private final CircuitBreaker circuitBreaker;
    private final TranslationEngineClient engineClient;
    private final TranslationCacheService cacheService;

    public TranslationResult translate(TranslationRequest request) {
        return CircuitBreaker.decorateSupplier(circuitBreaker,
                () -> engineClient.translate(request))
            .recover(CallNotPermittedException.class, ex -> {
                log.warn("Circuit open for translation engine, falling back to cache");
                return cacheService.getCachedTranslation(request)
                    .orElseThrow(() -> new ServiceDegradedException(
                        "Translation engine unavailable and no cached result found"));
            })
            .recover(TimeoutException.class, ex -> {
                log.warn("Translation engine timeout, attempting cache fallback");
                return cacheService.getCachedTranslation(request)
                    .orElseThrow(() -> new ServiceDegradedException(
                        "Translation engine timed out and no cached result found"));
            })
            .get();
    }
}

Bulkheads are the underappreciated companion to circuit breakers. They limit the concurrency to specific downstream services so that a slow service cannot consume all your threads:

@Bean
public ThreadPoolBulkhead translationEngineBulkhead() {
    return ThreadPoolBulkhead.of("translationEngine",
        ThreadPoolBulkheadConfig.custom()
            .maxThreadPoolSize(25)
            .coreThreadPoolSize(10)
            .queueCapacity(50)
            .keepAliveDuration(Duration.ofSeconds(20))
            .build());
}

API Gateway Pattern with Spring Cloud Gateway

For API gateways, I use Spring Cloud Gateway. It integrates naturally with the Spring ecosystem, supports reactive routing, and handles cross-cutting concerns like authentication, rate limiting, and request transformation in a centralized location.

Here is a production gateway configuration that demonstrates the patterns I find most useful:

@Configuration
public class GatewayRoutesConfig {

    @Bean
    public RouteLocator customRouteLocator(RouteLocatorBuilder builder) {
        return builder.routes()
            .route("translation-service", r -> r
                .path("/api/v1/translations/**")
                .filters(f -> f
                    .stripPrefix(0)
                    .addRequestHeader("X-Gateway-Timestamp",
                        String.valueOf(Instant.now().toEpochMilli()))
                    .retry(retryConfig -> retryConfig
                        .setRetries(2)
                        .setStatuses(HttpStatus.SERVICE_UNAVAILABLE,
                                     HttpStatus.GATEWAY_TIMEOUT)
                        .setBackoff(Duration.ofMillis(100),
                                    Duration.ofSeconds(1), 2, true))
                    .circuitBreaker(cb -> cb
                        .setName("translationServiceCB")
                        .setFallbackUri("forward:/fallback/translation"))
                    .requestRateLimiter(rl -> rl
                        .setRateLimiter(redisRateLimiter())
                        .setKeyResolver(apiKeyResolver())))
                .uri("lb://translation-service"))
            .route("user-service", r -> r
                .path("/api/v1/users/**")
                .filters(f -> f
                    .stripPrefix(0)
                    .circuitBreaker(cb -> cb
                        .setName("userServiceCB")
                        .setFallbackUri("forward:/fallback/user")))
                .uri("lb://user-service"))
            .build();
    }

    @Bean
    public RedisRateLimiter redisRateLimiter() {
        return new RedisRateLimiter(100, 200, 1);
    }

    @Bean
    public KeyResolver apiKeyResolver() {
        return exchange -> Mono.just(
            Optional.ofNullable(exchange.getRequest().getHeaders().getFirst("X-API-Key"))
                .orElse(exchange.getRequest().getRemoteAddress().getAddress().getHostAddress()));
    }
}

The rate limiter uses Redis, which means it works correctly across multiple gateway instances. Rate limits are per API key when one is present, falling back to IP address for unauthenticated requests.

For the fallback controller, I keep responses simple and informative:

@RestController
@RequestMapping("/fallback")
public class FallbackController {

    @GetMapping("/translation")
    public ResponseEntity<ErrorResponse> translationFallback() {
        return ResponseEntity
            .status(HttpStatus.SERVICE_UNAVAILABLE)
            .body(new ErrorResponse(
                "TRANSLATION_SERVICE_UNAVAILABLE",
                "The translation service is temporarily unavailable. Please retry in a few moments.",
                Instant.now()));
    }
}

Service Discovery and Configuration

For service discovery, I have largely moved away from Eureka toward Kubernetes-native service discovery. If you are already running on Kubernetes, adding another discovery layer is unnecessary complexity. Spring Cloud Kubernetes integrates cleanly:

# application.yml
spring:
  cloud:
    kubernetes:
      discovery:
        enabled: true
        all-namespaces: false
      loadbalancer:
        mode: service
      config:
        enabled: true
        sources:
          - name: translation-service-config
            namespace: translation-platform

For distributed configuration, I combine Kubernetes ConfigMaps with Spring Cloud Config for secrets and environment-specific overrides. The critical pattern is separating configuration that changes per environment (database URLs, API keys) from configuration that is intrinsic to the service (thread pool sizes, cache TTLs):

@Configuration
@ConfigurationProperties(prefix = "translation")
@Validated
public class TranslationProperties {

    @NotBlank
    private String engineBaseUrl;

    @Min(1) @Max(300)
    private int engineTimeoutSeconds = 30;

    @Min(1) @Max(100)
    private int maxConcurrentTranslations = 20;

    @NotNull
    private CacheProperties cache = new CacheProperties();

    @Data
    public static class CacheProperties {
        private Duration translationMemoryTtl = Duration.ofHours(24);
        private Duration glossaryTtl = Duration.ofHours(12);
        private int maxLocalCacheSize = 10_000;
    }

    // getters and setters
}

Using @Validated with JSR-303 annotations on configuration properties catches misconfigurations at startup rather than at runtime. This is one of those practices that seems minor but prevents a class of production incidents.

Distributed Tracing and Observability

Observability is not optional for microservices. You need distributed tracing, structured logging, and metrics from day one, not after the first production incident. I use Micrometer with OpenTelemetry for traces and Prometheus for metrics.

The key principle is: correlate everything with a trace ID. Every log line, every metric, every error report should be traceable back to the originating request:

@Component
@Slf4j
public class TranslationRequestHandler {

    private final MeterRegistry meterRegistry;
    private final TranslationService translationService;

    public TranslationResult handle(TranslationRequest request) {
        Timer.Sample sample = Timer.start(meterRegistry);

        try {
            TranslationResult result = translationService.translate(request);

            sample.stop(Timer.builder("translation.request.duration")
                .tag("source_lang", request.sourceLanguage())
                .tag("target_lang", request.targetLanguage())
                .tag("outcome", "success")
                .register(meterRegistry));

            meterRegistry.counter("translation.segments.processed",
                "source_lang", request.sourceLanguage(),
                "target_lang", request.targetLanguage())
                .increment(result.segmentCount());

            return result;

        } catch (Exception e) {
            sample.stop(Timer.builder("translation.request.duration")
                .tag("source_lang", request.sourceLanguage())
                .tag("target_lang", request.targetLanguage())
                .tag("outcome", "error")
                .register(meterRegistry));

            log.error("Translation failed for job={} source={} target={}",
                request.jobId(), request.sourceLanguage(), request.targetLanguage(), e);
            throw e;
        }
    }
}

For structured logging, I configure Logback to output JSON with trace context:

<encoder class="net.logstash.logback.encoder.LogstashEncoder">
    <includeMdcKeyName>traceId</includeMdcKeyName>
    <includeMdcKeyName>spanId</includeMdcKeyName>
    <customFields>{"service":"translation-service","environment":"${ENVIRONMENT}"}</customFields>
</encoder>

Key Takeaways

The patterns that matter most in production microservices are resilience patterns (circuit breakers, bulkheads, retries with backoff), observability (tracing, structured logging, metrics), and sensible service communication (gateway routing, service discovery, distributed configuration).

What matters less than people think: having a perfectly decomposed service boundary from day one, using every pattern from the microservices playbook, or choosing the “right” service mesh. Start with resilience and observability. Get those right, and everything else becomes easier to evolve over time.

The common thread across all these patterns is that they prepare your system for partial failure. In a distributed system, something is always failing. The question is whether your users notice.