API Rate Limiting: Complete Guide with Spring Boot Implementation

API rate limiting is a critical technique for protecting your services from overload, preventing abuse, and ensuring fair resource distribution among users. This comprehensive guide explores rate limiting strategies, algorithms, and practical implementation using Spring Boot.

Open Table of Contents

What is API Rate Limiting?
- Core Concepts
- Real-World Example: E-commerce Flash Sale
Why Rate Limiting Matters
Rate Limiting Algorithms
Real-World Rate Limiting Examples
Implementing Rate Limiting in Spring Boot
Advanced Rate Limiting Strategies
Best Practices
Conclusion
References
YouTube Videos

What is API Rate Limiting?

API rate limiting is a mechanism that controls the number of requests a client can make to an API within a specified time window. It acts as a gatekeeper, enforcing quotas to prevent system overload and ensure equitable access to resources.

Core Concepts

Request Quota: The maximum number of requests allowed within a time window (e.g., 100 requests per minute).

Time Window: The duration during which the quota applies (fixed or sliding window).

Client Identification: Method to identify unique clients (API keys, IP addresses, user IDs, JWT tokens).

Throttling Response: The action taken when limits are exceeded (HTTP 429 status code, queuing, or rejection).

Real-World Example: E-commerce Flash Sale

Consider an e-commerce platform running a flash sale. Without rate limiting, a single user could write a script to repeatedly check inventory and place orders hundreds of times per second, potentially:

Crashing the server with excessive load
Preventing legitimate customers from accessing the site
Exploiting the system to purchase all limited inventory

Rate limiting ensures each customer gets fair access, typically allowing 10-20 requests per second per user during peak events.

Why Rate Limiting Matters

1. Protection Against DDoS Attacks

Rate limiting provides the first line of defense against Distributed Denial of Service (DDoS) attacks. By limiting requests per IP address or client, malicious actors cannot easily overwhelm your infrastructure.

Example: Cloudflare’s Defense Cloudflare handles over 46 million HTTP requests per second globally. Their rate limiting blocks an average of 115 billion threats per day, preventing attacks like:

Layer 7 DDoS attacks sending thousands of legitimate-looking HTTP requests
Credential stuffing attempts testing stolen username/password combinations
Scraping bots harvesting product catalogs or pricing data

2. Cost Management

Cloud services charge based on usage. Uncontrolled API consumption can lead to astronomical bills.

Example: AWS Lambda Cost Protection A startup deployed a public API on AWS Lambda without rate limiting. A misconfigured mobile app entered an infinite retry loop, generating 50 million requests in 6 hours. The bill: $72,000. Rate limiting at 1000 requests/minute per user would have capped the damage at $50.

3. Fair Resource Distribution

Rate limiting ensures all users get equitable access to shared resources.

Example: Twitter API Tiers Twitter implements tiered rate limiting:

Free tier: 1,500 tweets per month, 500,000 tweet reads
Basic tier: 3,000 tweets, 10 million reads
Enterprise: Custom limits

This prevents a single analytics company from monopolizing Twitter’s API capacity, ensuring indie developers and researchers can still build applications.

4. System Stability and Performance

Uncontrolled request rates can degrade performance for all users through resource exhaustion.

Example: Database Connection Pool Saturation An API with 100 database connections receives a sudden spike of 10,000 concurrent requests. Without rate limiting:

All 100 connections are consumed within seconds
Requests queue up, causing timeouts
Application threads are blocked waiting for connections
Memory usage spikes from queued requests
Entire service becomes unresponsive

Rate limiting at 500 requests/second spreads the load, keeping the system responsive.

Rate Limiting Algorithms

1. Token Bucket Algorithm

The token bucket algorithm maintains a bucket that holds tokens. Each request consumes one token. Tokens are replenished at a fixed rate.

How it works:

Bucket starts with maximum capacity of tokens (e.g., 100 tokens)
Tokens are added at a fixed rate (e.g., 10 tokens/second)
Each request removes one token
If bucket is empty, requests are rejected
Bucket never exceeds maximum capacity

Advantages:

Allows traffic bursts up to bucket capacity
Smooth rate over time
Memory efficient

Example: Amazon S3 Request Rate Amazon S3 uses token bucket for request throttling:

PUT/POST/DELETE: 3,500 requests/second per prefix
GET/HEAD: 5,500 requests/second per prefix

This allows bursts (e.g., batch uploads of 1000 files instantly) while maintaining average rate limits.

// Token Bucket Implementation
public class TokenBucket {
    private final long capacity;
    private final double refillRate; // tokens per second
    private double tokens;
    private long lastRefillTimestamp;
    
    public TokenBucket(long capacity, double refillRate) {
        this.capacity = capacity;
        this.refillRate = refillRate;
        this.tokens = capacity;
        this.lastRefillTimestamp = System.nanoTime();
    }
    
    public synchronized boolean tryConsume(int tokensToConsume) {
        refill();
        
        if (tokens >= tokensToConsume) {
            tokens -= tokensToConsume;
            return true;
        }
        return false;
    }
    
    private void refill() {
        long now = System.nanoTime();
        double elapsedSeconds = (now - lastRefillTimestamp) / 1_000_000_000.0;
        double tokensToAdd = elapsedSeconds * refillRate;
        
        tokens = Math.min(capacity, tokens + tokensToAdd);
        lastRefillTimestamp = now;
    }
    
    public synchronized double getAvailableTokens() {
        refill();
        return tokens;
    }
}

2. Leaky Bucket Algorithm

The leaky bucket algorithm processes requests at a constant rate, regardless of incoming traffic. Requests are queued and processed (leak) at a fixed rate.

How it works:

Incoming requests are added to a queue (bucket)
Requests leak out at a constant rate
If queue is full, new requests are rejected
Provides smooth, constant output rate

Advantages:

Ensures consistent processing rate
Prevents sudden traffic spikes from overwhelming downstream services
Good for protecting databases or external APIs

Example: Shopify’s Background Job Processing Shopify uses leaky bucket for processing webhook deliveries:

Incoming webhooks are queued
System processes 100 webhooks/second regardless of queue size
Prevents overwhelming merchant servers with delivery attempts
Queue capacity: 10,000 webhooks per shop

// Leaky Bucket Implementation
public class LeakyBucket {
    private final Queue<Long> queue;
    private final int capacity;
    private final double leakRate; // requests per second
    private long lastLeakTimestamp;
    
    public LeakyBucket(int capacity, double leakRate) {
        this.queue = new LinkedList<>();
        this.capacity = capacity;
        this.leakRate = leakRate;
        this.lastLeakTimestamp = System.nanoTime();
    }
    
    public synchronized boolean tryAdd() {
        leak();
        
        if (queue.size() < capacity) {
            queue.offer(System.nanoTime());
            return true;
        }
        return false;
    }
    
    private void leak() {
        long now = System.nanoTime();
        double elapsedSeconds = (now - lastLeakTimestamp) / 1_000_000_000.0;
        int requestsToLeak = (int) (elapsedSeconds * leakRate);
        
        for (int i = 0; i < requestsToLeak && !queue.isEmpty(); i++) {
            queue.poll();
        }
        
        if (requestsToLeak > 0) {
            lastLeakTimestamp = now;
        }
    }
}

3. Fixed Window Counter

Divides time into fixed windows and counts requests per window.

How it works:

Time is divided into fixed intervals (e.g., 1-minute windows)
Counter tracks requests in current window
Counter resets at window boundary
Reject requests exceeding limit

Limitation: Allows twice the limit at window boundaries.

Example Boundary Issue:

Limit: 100 requests/minute
User sends 100 requests at 10:00:59
Window resets at 10:01:00
User sends 100 more requests at 10:01:01
Result: 200 requests in 2 seconds!

// Fixed Window Counter
public class FixedWindowCounter {
    private final int maxRequests;
    private final long windowSizeMillis;
    private int counter;
    private long windowStart;
    
    public FixedWindowCounter(int maxRequests, long windowSizeMillis) {
        this.maxRequests = maxRequests;
        this.windowSizeMillis = windowSizeMillis;
        this.counter = 0;
        this.windowStart = System.currentTimeMillis();
    }
    
    public synchronized boolean tryRequest() {
        long now = System.currentTimeMillis();
        
        // Check if we're in a new window
        if (now - windowStart >= windowSizeMillis) {
            counter = 0;
            windowStart = now;
        }
        
        if (counter < maxRequests) {
            counter++;
            return true;
        }
        return false;
    }
}

4. Sliding Window Log

Maintains a log of request timestamps and counts requests within a sliding time window.

How it works:

Store timestamp of each request
Remove timestamps older than window size
Count remaining timestamps
Allow request if count < limit

Advantages:

Precise rate limiting
No boundary issues
Accurate across all time periods

Disadvantages:

High memory usage (stores all timestamps)
Slower performance for high traffic

// Sliding Window Log
public class SlidingWindowLog {
    private final int maxRequests;
    private final long windowSizeMillis;
    private final Queue<Long> requestLog;
    
    public SlidingWindowLog(int maxRequests, long windowSizeMillis) {
        this.maxRequests = maxRequests;
        this.windowSizeMillis = windowSizeMillis;
        this.requestLog = new LinkedList<>();
    }
    
    public synchronized boolean tryRequest() {
        long now = System.currentTimeMillis();
        long windowStart = now - windowSizeMillis;
        
        // Remove old requests outside the window
        while (!requestLog.isEmpty() && requestLog.peek() <= windowStart) {
            requestLog.poll();
        }
        
        if (requestLog.size() < maxRequests) {
            requestLog.offer(now);
            return true;
        }
        return false;
    }
}

5. Sliding Window Counter

Combines fixed window efficiency with sliding window accuracy using weighted counts.

How it works:

Maintains counters for current and previous window
Calculates weighted sum based on overlap
Formula: previousCount * overlapPercentage + currentCount

Example:

Limit: 100 requests/minute
Previous window (10:00-10:01): 80 requests
Current window (10:01-10:02): 30 requests
Check at 10:01:30 (50% into current window):
Estimated count = 80 * 0.5 + 30 = 70 requests
Allow request ✓

// Sliding Window Counter
public class SlidingWindowCounter {
    private final int maxRequests;
    private final long windowSizeMillis;
    private int previousWindowCount;
    private int currentWindowCount;
    private long currentWindowStart;
    
    public SlidingWindowCounter(int maxRequests, long windowSizeMillis) {
        this.maxRequests = maxRequests;
        this.windowSizeMillis = windowSizeMillis;
        this.previousWindowCount = 0;
        this.currentWindowCount = 0;
        this.currentWindowStart = System.currentTimeMillis();
    }
    
    public synchronized boolean tryRequest() {
        long now = System.currentTimeMillis();
        long elapsed = now - currentWindowStart;
        
        // Move to new window if needed
        if (elapsed >= windowSizeMillis) {
            previousWindowCount = currentWindowCount;
            currentWindowCount = 0;
            currentWindowStart = now;
            elapsed = 0;
        }
        
        // Calculate weighted count
        double previousWeight = 1.0 - ((double) elapsed / windowSizeMillis);
        double estimatedCount = previousWindowCount * previousWeight + currentWindowCount;
        
        if (estimatedCount < maxRequests) {
            currentWindowCount++;
            return true;
        }
        return false;
    }
}

Real-World Rate Limiting Examples

GitHub API

GitHub implements sophisticated rate limiting across different authentication methods:

Unauthenticated Requests:

60 requests per hour per IP address
Headers returned: X-RateLimit-Limit: 60, X-RateLimit-Remaining: 57, X-RateLimit-Reset: 1676890800

OAuth Token:

5,000 requests per hour
Higher limits for GitHub Apps

GraphQL API:

Point-based system (not simple request count)
Each query costs points based on complexity
5,000 points per hour

Example Response Headers:

HTTP/1.1 200 OK
X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 4999
X-RateLimit-Reset: 1676890800
X-RateLimit-Used: 1
X-RateLimit-Resource: core

Stripe Payment API

Stripe uses adaptive rate limiting to prevent abuse while accommodating legitimate traffic spikes:

Default Limits:

100 read requests per second
100 write requests per second
Per account limits (not per API key)

Burst Handling:

Allows temporary bursts above limit
Uses token bucket algorithm
Automatically throttles sustained high traffic

Error Response:

{
  "error": {
    "type": "rate_limit_error",
    "message": "Too many requests",
    "retry_after": 30
  }
}

Twitter API v2

Twitter’s tiered approach based on access level:

Free Tier (per month):

1,500 tweet creates
500,000 tweet reads
10,000 Direct Messages

Basic Tier ($100/month):

3,000 tweet creates
10,000,000 tweet reads
15,000 Direct Messages

Rate Limit Headers:

x-rate-limit-limit: 180
x-rate-limit-remaining: 179
x-rate-limit-reset: 1676890920

Netflix API (Internal)

Netflix handles 2+ billion API calls per day across microservices:

Adaptive Concurrency Limits:

Dynamically adjusts based on system health
Uses latency-based feedback
Reduces limits when latency increases
Implemented via Netflix Concurrency Limits library

Circuit Breaker Integration:

Rate limiting triggers circuit breakers
Prevents cascading failures across 700+ microservices
Uses Hystrix for fault tolerance

Implementing Rate Limiting in Spring Boot

Method 1: Using Bucket4j Library

Bucket4j is a Java implementation of token bucket algorithm supporting various backends.

Step 1: Add Dependencies

<!-- pom.xml -->
<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>
    
    <dependency>
        <groupId>com.bucket4j</groupId>
        <artifactId>bucket4j-core</artifactId>
        <version>8.7.0</version>
    </dependency>
    
    <!-- For distributed cache (Redis) -->
    <dependency>
        <groupId>com.bucket4j</groupId>
        <artifactId>bucket4j-redis</artifactId>
        <version>8.7.0</version>
    </dependency>
    
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-data-redis</artifactId>
    </dependency>
</dependencies>

Step 2: Configuration Class

// RateLimitConfig.java
package com.adevguide.ratelimit.config;

import io.github.bucket4j.Bandwidth;
import io.github.bucket4j.Bucket;
import io.github.bucket4j.Refill;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.data.redis.connection.RedisConnectionFactory;
import org.springframework.data.redis.core.RedisTemplate;
import org.springframework.data.redis.serializer.GenericToStringSerializer;

import java.time.Duration;

@Configuration
public class RateLimitConfig {
    
    @Bean
    public RedisTemplate<String, Long> redisTemplate(RedisConnectionFactory factory) {
        RedisTemplate<String, Long> template = new RedisTemplate<>();
        template.setConnectionFactory(factory);
        template.setDefaultSerializer(new GenericToStringSerializer<>(Long.class));
        return template;
    }
    
    /**
     * Creates a bucket with:
     * - Capacity: 100 tokens
     * - Refill: 100 tokens per minute (greedy refill)
     */
    @Bean
    public Bucket createDefaultBucket() {
        Bandwidth limit = Bandwidth.classic(
            100, // capacity
            Refill.greedy(100, Duration.ofMinutes(1))
        );
        return Bucket.builder()
            .addLimit(limit)
            .build();
    }
}

Step 3: Rate Limiting Service

// RateLimitService.java
package com.adevguide.ratelimit.service;

import io.github.bucket4j.Bandwidth;
import io.github.bucket4j.Bucket;
import io.github.bucket4j.Refill;
import org.springframework.data.redis.core.RedisTemplate;
import org.springframework.stereotype.Service;

import java.time.Duration;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

@Service
public class RateLimitService {
    
    private final Map<String, Bucket> cache = new ConcurrentHashMap<>();
    private final RedisTemplate<String, Long> redisTemplate;
    
    public RateLimitService(RedisTemplate<String, Long> redisTemplate) {
        this.redisTemplate = redisTemplate;
    }
    
    /**
     * Resolve bucket for API key with specific limits
     */
    public Bucket resolveBucket(String apiKey, RateLimitPlan plan) {
        return cache.computeIfAbsent(apiKey, k -> createBucket(plan));
    }
    
    private Bucket createBucket(RateLimitPlan plan) {
        Bandwidth limit = Bandwidth.classic(
            plan.getCapacity(),
            Refill.greedy(plan.getTokens(), plan.getDuration())
        );
        return Bucket.builder()
            .addLimit(limit)
            .build();
    }
    
    /**
     * Check if request is allowed
     */
    public boolean tryConsume(String apiKey, RateLimitPlan plan, int tokens) {
        Bucket bucket = resolveBucket(apiKey, plan);
        return bucket.tryConsume(tokens);
    }
    
    /**
     * Get available tokens for client
     */
    public long getAvailableTokens(String apiKey, RateLimitPlan plan) {
        Bucket bucket = resolveBucket(apiKey, plan);
        return bucket.getAvailableTokens();
    }
}

Step 4: Rate Limit Plans

// RateLimitPlan.java
package com.adevguide.ratelimit.service;

import java.time.Duration;

public enum RateLimitPlan {
    FREE(20, 20, Duration.ofMinutes(1)),
    BASIC(100, 100, Duration.ofMinutes(1)),
    PREMIUM(500, 500, Duration.ofMinutes(1)),
    ENTERPRISE(5000, 5000, Duration.ofMinutes(1));
    
    private final long capacity;
    private final long tokens;
    private final Duration duration;
    
    RateLimitPlan(long capacity, long tokens, Duration duration) {
        this.capacity = capacity;
        this.tokens = tokens;
        this.duration = duration;
    }
    
    public long getCapacity() {
        return capacity;
    }
    
    public long getTokens() {
        return tokens;
    }
    
    public Duration getDuration() {
        return duration;
    }
    
    public static RateLimitPlan getPlanForApiKey(String apiKey) {
        // In real implementation, fetch from database
        if (apiKey.startsWith("ent_")) return ENTERPRISE;
        if (apiKey.startsWith("pre_")) return PREMIUM;
        if (apiKey.startsWith("bas_")) return BASIC;
        return FREE;
    }
}

Step 5: Rate Limit Interceptor

// RateLimitInterceptor.java
package com.adevguide.ratelimit.interceptor;

import com.adevguide.ratelimit.service.RateLimitPlan;
import com.adevguide.ratelimit.service.RateLimitService;
import jakarta.servlet.http.HttpServletRequest;
import jakarta.servlet.http.HttpServletResponse;
import org.springframework.http.HttpStatus;
import org.springframework.stereotype.Component;
import org.springframework.web.servlet.HandlerInterceptor;

@Component
public class RateLimitInterceptor implements HandlerInterceptor {
    
    private final RateLimitService rateLimitService;
    
    public RateLimitInterceptor(RateLimitService rateLimitService) {
        this.rateLimitService = rateLimitService;
    }
    
    @Override
    public boolean preHandle(HttpServletRequest request, 
                           HttpServletResponse response, 
                           Object handler) throws Exception {
        
        String apiKey = request.getHeader("X-API-Key");
        
        if (apiKey == null || apiKey.isEmpty()) {
            response.sendError(HttpStatus.UNAUTHORIZED.value(), "Missing API Key");
            return false;
        }
        
        RateLimitPlan plan = RateLimitPlan.getPlanForApiKey(apiKey);
        
        if (rateLimitService.tryConsume(apiKey, plan, 1)) {
            long availableTokens = rateLimitService.getAvailableTokens(apiKey, plan);
            
            // Add rate limit headers
            response.setHeader("X-RateLimit-Limit", String.valueOf(plan.getCapacity()));
            response.setHeader("X-RateLimit-Remaining", String.valueOf(availableTokens));
            response.setHeader("X-RateLimit-Reset", 
                String.valueOf(System.currentTimeMillis() + plan.getDuration().toMillis()));
            
            return true;
        } else {
            response.setStatus(HttpStatus.TOO_MANY_REQUESTS.value());
            response.setHeader("X-RateLimit-Retry-After-Seconds", "60");
            response.getWriter().write("{\"error\": \"Rate limit exceeded. Try again later.\"}");
            return false;
        }
    }
}

Step 6: Register Interceptor

// WebConfig.java
package com.adevguide.ratelimit.config;

import com.adevguide.ratelimit.interceptor.RateLimitInterceptor;
import org.springframework.context.annotation.Configuration;
import org.springframework.web.servlet.config.annotation.InterceptorRegistry;
import org.springframework.web.servlet.config.annotation.WebMvcConfigurer;

@Configuration
public class WebConfig implements WebMvcConfigurer {
    
    private final RateLimitInterceptor rateLimitInterceptor;
    
    public WebConfig(RateLimitInterceptor rateLimitInterceptor) {
        this.rateLimitInterceptor = rateLimitInterceptor;
    }
    
    @Override
    public void addInterceptors(InterceptorRegistry registry) {
        registry.addInterceptor(rateLimitInterceptor)
                .addPathPatterns("/api/**") // Apply to all API endpoints
                .excludePathPatterns("/api/public/**"); // Exclude public endpoints
    }
}

Step 7: Controller Example

// ProductController.java
package com.adevguide.ratelimit.controller;

import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;

import java.util.List;
import java.util.Map;

@RestController
@RequestMapping("/api/products")
public class ProductController {
    
    @GetMapping
    public ResponseEntity<List<Map<String, Object>>> getProducts() {
        // Simulated product list
        List<Map<String, Object>> products = List.of(
            Map.of("id", 1, "name", "Laptop", "price", 999.99),
            Map.of("id", 2, "name", "Mouse", "price", 29.99),
            Map.of("id", 3, "name", "Keyboard", "price", 79.99)
        );
        
        return ResponseEntity.ok(products);
    }
    
    @GetMapping("/{id}")
    public ResponseEntity<Map<String, Object>> getProduct(@PathVariable Long id) {
        Map<String, Object> product = Map.of(
            "id", id,
            "name", "Product " + id,
            "price", 99.99
        );
        
        return ResponseEntity.ok(product);
    }
    
    @PostMapping
    public ResponseEntity<Map<String, Object>> createProduct(
            @RequestBody Map<String, Object> product) {
        // In real app, save to database
        return ResponseEntity.ok(Map.of(
            "message", "Product created",
            "product", product
        ));
    }
}

Method 2: Using Spring AOP and Custom Annotation

Create a custom annotation for flexible rate limiting.

Step 1: Rate Limit Annotation

// RateLimit.java
package com.adevguide.ratelimit.annotation;

import java.lang.annotation.*;
import java.time.temporal.ChronoUnit;

@Target(ElementType.METHOD)
@Retention(RetentionPolicy.RUNTIME)
@Documented
public @interface RateLimit {
    /**
     * Maximum number of requests allowed
     */
    int limit() default 100;
    
    /**
     * Time window duration
     */
    int duration() default 1;
    
    /**
     * Time unit for duration
     */
    ChronoUnit unit() default ChronoUnit.MINUTES;
    
    /**
     * Key type for rate limiting
     */
    KeyType keyType() default KeyType.API_KEY;
    
    enum KeyType {
        API_KEY,
        IP_ADDRESS,
        USER_ID,
        CUSTOM
    }
    
    /**
     * Custom key expression (SpEL)
     */
    String customKey() default "";
}

Step 2: Rate Limit Aspect

// RateLimitAspect.java
package com.adevguide.ratelimit.aspect;

import com.adevguide.ratelimit.annotation.RateLimit;
import com.adevguide.ratelimit.exception.RateLimitExceededException;
import io.github.bucket4j.Bandwidth;
import io.github.bucket4j.Bucket;
import io.github.bucket4j.Refill;
import jakarta.servlet.http.HttpServletRequest;
import org.aspectj.lang.ProceedingJoinPoint;
import org.aspectj.lang.annotation.Around;
import org.aspectj.lang.annotation.Aspect;
import org.springframework.stereotype.Component;
import org.springframework.web.context.request.RequestContextHolder;
import org.springframework.web.context.request.ServletRequestAttributes;

import java.time.Duration;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

@Aspect
@Component
public class RateLimitAspect {
    
    private final Map<String, Bucket> buckets = new ConcurrentHashMap<>();
    
    @Around("@annotation(rateLimit)")
    public Object rateLimit(ProceedingJoinPoint joinPoint, RateLimit rateLimit) 
            throws Throwable {
        
        String key = resolveKey(rateLimit);
        Bucket bucket = getBucket(key, rateLimit);
        
        if (bucket.tryConsume(1)) {
            return joinPoint.proceed();
        } else {
            throw new RateLimitExceededException(
                "Rate limit exceeded. Maximum " + rateLimit.limit() + 
                " requests per " + rateLimit.duration() + " " + rateLimit.unit()
            );
        }
    }
    
    private String resolveKey(RateLimit rateLimit) {
        HttpServletRequest request = getCurrentRequest();
        
        return switch (rateLimit.keyType()) {
            case API_KEY -> request.getHeader("X-API-Key");
            case IP_ADDRESS -> request.getRemoteAddr();
            case USER_ID -> request.getHeader("X-User-ID");
            case CUSTOM -> rateLimit.customKey(); // Would use SpEL in production
        };
    }
    
    private Bucket getBucket(String key, RateLimit rateLimit) {
        return buckets.computeIfAbsent(key, k -> createBucket(rateLimit));
    }
    
    private Bucket createBucket(RateLimit rateLimit) {
        Duration duration = Duration.of(rateLimit.duration(), rateLimit.unit());
        Bandwidth bandwidth = Bandwidth.classic(
            rateLimit.limit(),
            Refill.greedy(rateLimit.limit(), duration)
        );
        return Bucket.builder().addLimit(bandwidth).build();
    }
    
    private HttpServletRequest getCurrentRequest() {
        ServletRequestAttributes attributes = 
            (ServletRequestAttributes) RequestContextHolder.getRequestAttributes();
        return attributes != null ? attributes.getRequest() : null;
    }
}

Step 3: Custom Exception

// RateLimitExceededException.java
package com.adevguide.ratelimit.exception;

public class RateLimitExceededException extends RuntimeException {
    public RateLimitExceededException(String message) {
        super(message);
    }
}

Step 4: Global Exception Handler

// GlobalExceptionHandler.java
package com.adevguide.ratelimit.exception;

import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.ExceptionHandler;
import org.springframework.web.bind.annotation.RestControllerAdvice;

import java.time.LocalDateTime;
import java.util.Map;

@RestControllerAdvice
public class GlobalExceptionHandler {
    
    @ExceptionHandler(RateLimitExceededException.class)
    public ResponseEntity<Map<String, Object>> handleRateLimitExceeded(
            RateLimitExceededException ex) {
        
        Map<String, Object> errorResponse = Map.of(
            "timestamp", LocalDateTime.now(),
            "status", HttpStatus.TOO_MANY_REQUESTS.value(),
            "error", "Too Many Requests",
            "message", ex.getMessage(),
            "retryAfter", 60 // seconds
        );
        
        return ResponseEntity
            .status(HttpStatus.TOO_MANY_REQUESTS)
            .header("Retry-After", "60")
            .body(errorResponse);
    }
}

Step 5: Using the Annotation

// AnalyticsController.java
package com.adevguide.ratelimit.controller;

import com.adevguide.ratelimit.annotation.RateLimit;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;

import java.time.temporal.ChronoUnit;
import java.util.Map;

@RestController
@RequestMapping("/api/analytics")
public class AnalyticsController {
    
    @GetMapping("/reports")
    @RateLimit(limit = 10, duration = 1, unit = ChronoUnit.MINUTES)
    public ResponseEntity<Map<String, Object>> generateReport() {
        // Expensive operation
        Map<String, Object> report = Map.of(
            "totalSales", 125000,
            "orders", 450,
            "averageOrderValue", 277.78
        );
        
        return ResponseEntity.ok(report);
    }
    
    @PostMapping("/events")
    @RateLimit(limit = 1000, duration = 1, unit = ChronoUnit.HOURS, keyType = RateLimit.KeyType.IP_ADDRESS)
    public ResponseEntity<String> trackEvent(@RequestBody Map<String, Object> event) {
        // Track user event
        return ResponseEntity.ok("Event tracked");
    }
    
    @GetMapping("/dashboard")
    @RateLimit(limit = 50, duration = 5, unit = ChronoUnit.MINUTES, keyType = RateLimit.KeyType.USER_ID)
    public ResponseEntity<Map<String, Object>> getDashboard() {
        return ResponseEntity.ok(Map.of("data", "Dashboard data"));
    }
}

Method 3: Distributed Rate Limiting with Redis

For microservices running multiple instances, use Redis for shared state.

Step 1: Redis Rate Limiter Service

// RedisRateLimiterService.java
package com.adevguide.ratelimit.service;

import org.springframework.data.redis.core.RedisTemplate;
import org.springframework.data.redis.core.script.DefaultRedisScript;
import org.springframework.stereotype.Service;

import java.time.Duration;
import java.util.Collections;
import java.util.List;

@Service
public class RedisRateLimiterService {
    
    private final RedisTemplate<String, String> redisTemplate;
    
    // Lua script for atomic rate limiting
    private static final String RATE_LIMIT_SCRIPT = """
        local key = KEYS[1]
        local limit = tonumber(ARGV[1])
        local window = tonumber(ARGV[2])
        local current = tonumber(redis.call('GET', key) or "0")
        
        if current < limit then
            redis.call('INCR', key)
            if current == 0 then
                redis.call('EXPIRE', key, window)
            end
            return 1
        else
            return 0
        end
        """;
    
    public RedisRateLimiterService(RedisTemplate<String, String> redisTemplate) {
        this.redisTemplate = redisTemplate;
    }
    
    /**
     * Try to consume a token using Redis
     * @param key Rate limit key (e.g., "ratelimit:user:123")
     * @param limit Maximum requests allowed
     * @param windowSeconds Time window in seconds
     * @return true if allowed, false if rate limit exceeded
     */
    public boolean tryConsume(String key, int limit, int windowSeconds) {
        DefaultRedisScript<Long> script = new DefaultRedisScript<>();
        script.setScriptText(RATE_LIMIT_SCRIPT);
        script.setResultType(Long.class);
        
        Long result = redisTemplate.execute(
            script,
            Collections.singletonList(key),
            String.valueOf(limit),
            String.valueOf(windowSeconds)
        );
        
        return result != null && result == 1L;
    }
    
    /**
     * Get current request count
     */
    public long getCurrentCount(String key) {
        String count = redisTemplate.opsForValue().get(key);
        return count != null ? Long.parseLong(count) : 0;
    }
    
    /**
     * Get remaining requests
     */
    public long getRemaining(String key, int limit) {
        long current = getCurrentCount(key);
        return Math.max(0, limit - current);
    }
    
    /**
     * Get TTL (time to reset) in seconds
     */
    public long getTimeToReset(String key) {
        Long ttl = redisTemplate.getExpire(key);
        return ttl != null ? ttl : 0;
    }
}

Step 2: Distributed Rate Limit Filter

// DistributedRateLimitFilter.java
package com.adevguide.ratelimit.filter;

import com.adevguide.ratelimit.service.RedisRateLimiterService;
import jakarta.servlet.*;
import jakarta.servlet.http.HttpServletRequest;
import jakarta.servlet.http.HttpServletResponse;
import org.springframework.http.HttpStatus;
import org.springframework.stereotype.Component;

import java.io.IOException;

@Component
public class DistributedRateLimitFilter implements Filter {
    
    private final RedisRateLimiterService rateLimiterService;
    private static final int LIMIT = 100;
    private static final int WINDOW_SECONDS = 60;
    
    public DistributedRateLimitFilter(RedisRateLimiterService rateLimiterService) {
        this.rateLimiterService = rateLimiterService;
    }
    
    @Override
    public void doFilter(ServletRequest request, ServletResponse response, 
                        FilterChain chain) throws IOException, ServletException {
        
        HttpServletRequest httpRequest = (HttpServletRequest) request;
        HttpServletResponse httpResponse = (HttpServletResponse) response;
        
        String apiKey = httpRequest.getHeader("X-API-Key");
        if (apiKey == null) {
            httpResponse.sendError(HttpStatus.UNAUTHORIZED.value(), "Missing API Key");
            return;
        }
        
        String rateLimitKey = "ratelimit:api:" + apiKey;
        
        if (rateLimiterService.tryConsume(rateLimitKey, LIMIT, WINDOW_SECONDS)) {
            long remaining = rateLimiterService.getRemaining(rateLimitKey, LIMIT);
            long resetTime = System.currentTimeMillis() / 1000 + 
                           rateLimiterService.getTimeToReset(rateLimitKey);
            
            httpResponse.setHeader("X-RateLimit-Limit", String.valueOf(LIMIT));
            httpResponse.setHeader("X-RateLimit-Remaining", String.valueOf(remaining));
            httpResponse.setHeader("X-RateLimit-Reset", String.valueOf(resetTime));
            
            chain.doFilter(request, response);
        } else {
            long retryAfter = rateLimiterService.getTimeToReset(rateLimitKey);
            httpResponse.setStatus(HttpStatus.TOO_MANY_REQUESTS.value());
            httpResponse.setHeader("Retry-After", String.valueOf(retryAfter));
            httpResponse.setHeader("X-RateLimit-Limit", String.valueOf(LIMIT));
            httpResponse.setHeader("X-RateLimit-Remaining", "0");
            httpResponse.getWriter().write(
                "{\"error\": \"Rate limit exceeded\", \"retryAfter\": " + retryAfter + "}"
            );
        }
    }
}

Step 3: Application Properties

# application.yml
spring:
  data:
    redis:
      host: localhost
      port: 6379
      password: ${REDIS_PASSWORD:}
      timeout: 2000ms
      lettuce:
        pool:
          max-active: 8
          max-idle: 8
          min-idle: 0
          max-wait: -1ms

# Rate Limit Configuration
rate-limit:
  plans:
    free:
      limit: 20
      window-seconds: 60
    basic:
      limit: 100
      window-seconds: 60
    premium:
      limit: 500
      window-seconds: 60
    enterprise:
      limit: 5000
      window-seconds: 60

Advanced Rate Limiting Strategies

1. Cost-Based Rate Limiting

Different operations consume different amounts of quota based on computational cost.

Example: Database Query Complexity

// CostBasedRateLimiter.java
public class CostBasedRateLimiter {
    
    public enum OperationCost {
        SIMPLE_READ(1),      // SELECT with index
        COMPLEX_READ(5),     // JOIN queries
        WRITE(3),            // INSERT/UPDATE
        BATCH_WRITE(10),     // Bulk operations
        ANALYTICS(20);       // Heavy aggregations
        
        private final int cost;
        
        OperationCost(int cost) {
            this.cost = cost;
        }
        
        public int getCost() {
            return cost;
        }
    }
    
    private final Bucket bucket;
    
    public boolean tryConsumeOperation(OperationCost operation) {
        return bucket.tryConsume(operation.getCost());
    }
}

GraphQL Example (like GitHub’s points system):

@Service
public class GraphQLCostCalculator {
    
    /**
     * Calculate query cost based on:
     * - Number of fields requested
     * - Depth of nested queries
     * - Connection/pagination sizes
     */
    public int calculateQueryCost(String query) {
        int baseCost = 1;
        int fieldCount = countFields(query);
        int depth = calculateDepth(query);
        int paginationMultiplier = extractPaginationSize(query);
        
        return baseCost + (fieldCount * depth * paginationMultiplier / 100);
    }
    
    private int countFields(String query) {
        // Parse and count requested fields
        return 10; // Simplified
    }
    
    private int calculateDepth(String query) {
        // Calculate nesting depth
        return 3; // Simplified
    }
    
    private int extractPaginationSize(String query) {
        // Extract "first: N" or "last: N" parameters
        return 100; // Default
    }
}

2. Adaptive Rate Limiting

Dynamically adjust limits based on system health and load.

// AdaptiveRateLimiter.java
@Service
public class AdaptiveRateLimiter {
    
    private final MetricsService metricsService;
    private volatile int currentLimit;
    
    public AdaptiveRateLimiter(MetricsService metricsService) {
        this.metricsService = metricsService;
        this.currentLimit = 1000; // Default
        startAdaptiveAdjustment();
    }
    
    private void startAdaptiveAdjustment() {
        ScheduledExecutorService scheduler = Executors.newSingleThreadScheduledExecutor();
        scheduler.scheduleAtFixedRate(this::adjustLimit, 0, 10, TimeUnit.SECONDS);
    }
    
    private void adjustLimit() {
        double cpuUsage = metricsService.getCpuUsage();
        double memoryUsage = metricsService.getMemoryUsage();
        double avgLatency = metricsService.getAverageLatency();
        
        // Reduce limit if system under stress
        if (cpuUsage > 80 || memoryUsage > 85 || avgLatency > 1000) {
            currentLimit = (int) (currentLimit * 0.8); // Reduce by 20%
        } 
        // Gradually increase if healthy
        else if (cpuUsage < 50 && memoryUsage < 60 && avgLatency < 200) {
            currentLimit = (int) (currentLimit * 1.1); // Increase by 10%
        }
        
        // Keep within bounds
        currentLimit = Math.max(100, Math.min(5000, currentLimit));
    }
    
    public int getCurrentLimit() {
        return currentLimit;
    }
}

3. Tiered Rate Limiting by User Role

Different limits for different user tiers.

// TieredRateLimitService.java
@Service
public class TieredRateLimitService {
    
    private final Map<String, Bucket> userBuckets = new ConcurrentHashMap<>();
    private final UserService userService;
    
    public boolean allowRequest(String userId) {
        User user = userService.findById(userId);
        Bucket bucket = userBuckets.computeIfAbsent(
            userId, 
            id -> createBucketForUserTier(user.getTier())
        );
        
        return bucket.tryConsume(1);
    }
    
    private Bucket createBucketForUserTier(UserTier tier) {
        return switch (tier) {
            case FREE -> createBucket(20, Duration.ofMinutes(1));
            case BASIC -> createBucket(100, Duration.ofMinutes(1));
            case PREMIUM -> createBucket(500, Duration.ofMinutes(1));
            case ENTERPRISE -> createBucket(5000, Duration.ofMinutes(1));
        };
    }
    
    private Bucket createBucket(long capacity, Duration refillDuration) {
        Bandwidth bandwidth = Bandwidth.classic(
            capacity,
            Refill.greedy(capacity, refillDuration)
        );
        return Bucket.builder().addLimit(bandwidth).build();
    }
}

4. Rate Limiting with Quota Carryover

Allow unused quota to roll over (up to a limit).

// QuotaCarryoverBucket.java
public class QuotaCarryoverBucket {
    
    private final long baseCapacity;
    private final long maxCapacity; // 2x base for carryover
    private final double refillRate;
    private double tokens;
    private long lastRefillTime;
    
    public QuotaCarryoverBucket(long baseCapacity, double refillRate) {
        this.baseCapacity = baseCapacity;
        this.maxCapacity = baseCapacity * 2; // Allow 100% carryover
        this.refillRate = refillRate;
        this.tokens = baseCapacity;
        this.lastRefillTime = System.nanoTime();
    }
    
    public synchronized boolean tryConsume(int tokensToConsume) {
        refill();
        
        if (tokens >= tokensToConsume) {
            tokens -= tokensToConsume;
            return true;
        }
        return false;
    }
    
    private void refill() {
        long now = System.nanoTime();
        double elapsedSeconds = (now - lastRefillTime) / 1_000_000_000.0;
        double tokensToAdd = elapsedSeconds * refillRate;
        
        // Allow carryover up to maxCapacity
        tokens = Math.min(maxCapacity, tokens + tokensToAdd);
        lastRefillTime = now;
    }
}

Best Practices

1. Return Proper Headers

Always include rate limit information in response headers:

// Standard rate limit headers
X-RateLimit-Limit: 100          // Maximum requests allowed
X-RateLimit-Remaining: 73       // Requests remaining in window
X-RateLimit-Reset: 1676890800   // Unix timestamp when limit resets
Retry-After: 45                 // Seconds until retry (on 429 response)

2. Use Appropriate HTTP Status Codes

// 429 Too Many Requests
@ExceptionHandler(RateLimitExceededException.class)
public ResponseEntity<ErrorResponse> handleRateLimit(RateLimitExceededException ex) {
    ErrorResponse error = new ErrorResponse(
        "Rate limit exceeded",
        "You have exceeded your request quota"
    );
    
    return ResponseEntity
        .status(HttpStatus.TOO_MANY_REQUESTS) // 429
        .header("Retry-After", "60")
        .body(error);
}

3. Implement Graceful Degradation

@Service
public class OrderService {
    
    private final RateLimiter rateLimiter;
    private final CacheService cacheService;
    
    public List<Order> getOrders(String userId) {
        if (!rateLimiter.allowRequest(userId)) {
            // Serve cached data instead of rejecting
            return cacheService.getCachedOrders(userId);
        }
        
        List<Order> orders = fetchOrdersFromDatabase(userId);
        cacheService.cacheOrders(userId, orders);
        return orders;
    }
}

4. Monitor and Alert

@Component
public class RateLimitMetrics {
    
    private final MeterRegistry meterRegistry;
    
    public void recordRateLimitExceeded(String endpoint, String userId) {
        Counter.builder("rate_limit.exceeded")
            .tag("endpoint", endpoint)
            .tag("user", userId)
            .register(meterRegistry)
            .increment();
    }
    
    public void recordAllowedRequest(String endpoint) {
        Counter.builder("rate_limit.allowed")
            .tag("endpoint", endpoint)
            .register(meterRegistry)
            .increment();
    }
}

5. Distributed Systems Considerations

For microservices architectures:

Use centralized storage (Redis, Hazelcast) for shared state across instances:

// Redis-backed distributed rate limiter
@Service
public class DistributedRateLimiter {
    
    private final RedisTemplate<String, String> redisTemplate;
    
    public boolean allowRequest(String key, int limit, Duration window) {
        String redisKey = "ratelimit:" + key;
        
        // Use Redis INCR with EXPIRE for atomic operation
        Long count = redisTemplate.opsForValue().increment(redisKey);
        
        if (count == 1) {
            redisTemplate.expire(redisKey, window);
        }
        
        return count <= limit;
    }
}

6. Document Rate Limits

Provide clear documentation in API responses:

{
  "rateLimits": {
    "endpoint": "/api/products",
    "limit": 100,
    "window": "1 minute",
    "current": 73,
    "resetAt": "2026-02-08T10:05:00Z"
  },
  "data": [...]
}

7. Test Rate Limiting

@SpringBootTest
@AutoConfigureMockMvc
public class RateLimitTest {
    
    @Autowired
    private MockMvc mockMvc;
    
    @Test
    public void testRateLimitEnforcement() throws Exception {
        // Make requests up to limit
        for (int i = 0; i < 100; i++) {
            mockMvc.perform(get("/api/products")
                    .header("X-API-Key", "test-key"))
                .andExpect(status().isOk());
        }
        
        // 101st request should be rate limited
        mockMvc.perform(get("/api/products")
                .header("X-API-Key", "test-key"))
            .andExpect(status().isTooManyRequests())
            .andExpect(header().exists("Retry-After"));
    }
}

8. Handle Clock Skew in Distributed Systems

// Use monotonic time sources
public class ClockSkewSafeRateLimiter {
    
    // Use System.nanoTime() instead of System.currentTimeMillis()
    // nanoTime() is monotonic and won't jump backward
    private long lastCheckNanos = System.nanoTime();
    
    public boolean isAllowed() {
        long now = System.nanoTime();
        long elapsedNanos = now - lastCheckNanos;
        
        // Convert to seconds
        double elapsedSeconds = elapsedNanos / 1_000_000_000.0;
        
        // ... rate limiting logic
        return true;
    }
}

Conclusion

API rate limiting is essential for building robust, scalable, and secure web services. By implementing proper rate limiting strategies, you protect your infrastructure from overload, prevent abuse, ensure fair access, and control operational costs.

Key takeaways:

Choose the right algorithm: Token bucket for burst handling, leaky bucket for smooth output, sliding window for accuracy
Use proper HTTP semantics: Return 429 status codes with Retry-After headers
Distribute state wisely: Use Redis or distributed caches for microservices
Monitor and adapt: Track rate limit hits and adjust limits based on system health
Implement gracefully: Serve cached data or degraded responses instead of hard failures
Document clearly: Make rate limits transparent to API consumers

Spring Boot provides excellent tools for implementing rate limiting, from simple in-memory solutions to sophisticated distributed systems using Redis. Start with basic fixed-window counters for prototypes, then graduate to token bucket or sliding window algorithms for production systems.

Real-world companies like GitHub, Stripe, and Twitter demonstrate that thoughtful rate limiting design directly impacts user experience, system reliability, and business sustainability. Invest time in designing rate limits that balance protection with usability.

References

Bucket4j Documentation - https://bucket4j.com/
Spring Boot Official Documentation - https://spring.io/projects/spring-boot
Redis Rate Limiting Patterns - https://redis.io/docs/latest/commands/incr/
GitHub API Rate Limiting - https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting
Stripe API Rate Limits - https://stripe.com/docs/rate-limits
Twitter API v2 Rate Limits - https://developer.twitter.com/en/docs/twitter-api/rate-limits
Netflix Concurrency Limits - https://github.com/Netflix/concurrency-limits
AWS API Gateway Throttling - https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-request-throttling.html
RFC 6585 - Additional HTTP Status Codes - https://tools.ietf.org/html/rfc6585
Google Cloud Quotas and Limits - https://cloud.google.com/apis/design/design_patterns#rate_limiting

YouTube Videos

“API Rate Limiting Algorithms Explained” - Hussein Nasser [https://www.youtube.com/watch?v=FU4WlwfS3G0]
“Rate Limiting in Microservices” - Tech Primers [https://www.youtube.com/watch?v=NtMvNh0WFVM]
“Building Rate Limiters at Scale” - InfoQ [https://www.youtube.com/watch?v=CRGPbCbRTHA]
“Spring Boot Rate Limiting Tutorial” - Amigoscode [https://www.youtube.com/watch?v=xDuwrtwYHu8]
“System Design: Rate Limiter” - Gaurav Sen [https://www.youtube.com/watch?v=mhUQe4BKZXs]

API Rate Limiting: Complete Guide with Spring Boot Implementation

Key Takeaways

Table of Contents

What is API Rate Limiting?

Core Concepts

Real-World Example: E-commerce Flash Sale

Why Rate Limiting Matters

1. Protection Against DDoS Attacks

2. Cost Management

3. Fair Resource Distribution

4. System Stability and Performance

Rate Limiting Algorithms

1. Token Bucket Algorithm

2. Leaky Bucket Algorithm

3. Fixed Window Counter

4. Sliding Window Log

5. Sliding Window Counter

Real-World Rate Limiting Examples

GitHub API

Stripe Payment API

Twitter API v2

Netflix API (Internal)

Implementing Rate Limiting in Spring Boot

Method 1: Using Bucket4j Library

Method 2: Using Spring AOP and Custom Annotation

Method 3: Distributed Rate Limiting with Redis

Advanced Rate Limiting Strategies

1. Cost-Based Rate Limiting

2. Adaptive Rate Limiting

3. Tiered Rate Limiting by User Role

4. Rate Limiting with Quota Carryover

Best Practices

1. Return Proper Headers

2. Use Appropriate HTTP Status Codes

3. Implement Graceful Degradation

4. Monitor and Alert

5. Distributed Systems Considerations

6. Document Rate Limits

7. Test Rate Limiting

8. Handle Clock Skew in Distributed Systems

Conclusion

References

YouTube Videos

Next in Series

Related Posts

Top Spring Boot Interview Questions and Answers

An Easy Step-By-Step Guide to Changing Server Port in a Spring Boot Application [4 Ways]

Spring Boot @Valid vs @Validated: An Easy In-Depth Comparison

Keep Learning with New Posts

Was this guide helpful?