4. Retry with backoff and jitter

Networks stutter. APIs time out. Services go down for thirty seconds and come back. The categorizer from the last page already knows which failures are transient; the piece missing is the logic that actually retries them. Done well, the user never notices the first attempt failed. Done poorly, your client hammers a recovering service back into the ground. This page is the well-done version: exponential backoff, jitter, a cap on attempts, and honoring Retry-After when the server tells you exactly how long to wait.

The core principle

Only retry failures that are likely temporary. Never retry user-input errors (permanent problems) or not-found errors (the resource doesn't exist). The categorization system from the last page already encodes this logic: only the transient category triggers retry. Selective retry prevents wasted attempts on failures that won't resolve, while giving temporary issues the time they need.

Exponential backoff: give services breathing room

When a service is struggling, hitting it again immediately makes the problem worse. Exponential backoff increases wait times between retries, giving the service time to recover:

Retry timeline: three failed attempts in red, each followed by an amber wait pill (1s, 2s, 4s). Fourth attempt succeeds in green after a total 7 seconds of waiting. — **Exponential backoff in action.** Each failure doubles the wait. Seven seconds of total wait is the cost of riding out a service blip without the user seeing anything.

The wait time doubles after each failure: 1s, 2s, 4s, 8s. This exponential growth gives struggling services progressively more time to recover while preventing your application from hammering a failing service.

backoff.py

import time

def calculate_backoff_delay(attempt, base_delay=1.0):
    """Calculate exponential backoff delay for retry attempt."""
    return base_delay * (2 ** attempt)

# Usage
for attempt in range(4):
    delay = calculate_backoff_delay(attempt)
    print(f"Attempt {attempt + 1}: wait {delay}s before retry")

# Output:
# Attempt 1: wait 1.0s before retry
# Attempt 2: wait 2.0s before retry
# Attempt 3: wait 4.0s before retry
# Attempt 4: wait 8.0s before retry

This pattern is used universally in production systems because it balances quick recovery (early retries happen fast) with service protection (later retries give more recovery time).

Jitter: prevent thundering herds

Exponential backoff alone has a critical flaw: if many users hit an error simultaneously, they'll all retry simultaneously. This creates synchronized retry waves (thundering herds) that can extend outages or even cause cascading failures.

Terminal

12:00:00 - API goes down, 1000 users all fail simultaneously
12:00:01 - ALL 1000 users retry at exactly the same time (thundering herd!)
12:00:03 - ALL 1000 users retry again at exactly the same time
12:00:07 - ALL 1000 users retry again at exactly the same time

Every retry hits the service with the full load of all failed users at once. This synchronized load can overwhelm the service even after it's recovered, extending the outage.

Terminal

12:00:00 - API goes down, 1000 users all fail simultaneously
12:00:01 - Users retry spread across 1.0-1.5 seconds (distributed load)
12:00:03 - Users retry spread across 2.0-3.0 seconds (distributed load)
12:00:07 - Users retry spread across 4.0-6.0 seconds (distributed load)

Jitter adds random variation to wait times, spreading retries over time. Instead of 1000 simultaneous requests, you get a distributed load the service can handle.

Why the big clouds all do this.

Every major cloud SDK (AWS, Google Cloud, Azure) ships retry logic with both exponential backoff and jitter, because their networks have felt what synchronized retries do to a recovering service: the client herd hits the service the instant it comes back up, pushes it back down, and the outage extends. The fix is a single line: wait = base_delay * (2 ** attempt) + random(0, base_delay * 0.5). That's all jitter is. One random component, added to the exponential wait, and the thundering herd becomes a gentle stream.

backoff_jitter.py

import time
import random

def calculate_backoff_with_jitter(attempt, base_delay=1.0):
    """Calculate exponential backoff with jitter."""
    exponential_delay = base_delay * (2 ** attempt)
    jitter = random.uniform(0, base_delay * 0.5)
    return exponential_delay + jitter

# Usage - notice the random variation
for attempt in range(4):
    delay = calculate_backoff_with_jitter(attempt)
    print(f"Attempt {attempt + 1}: wait {delay:.2f}s before retry")

# Output (random variation each time):
# Attempt 1: wait 1.23s before retry
# Attempt 2: wait 2.41s before retry
# Attempt 3: wait 4.17s before retry
# Attempt 4: wait 8.38s before retry

The jitter (±50% random variation) is just enough to desynchronize retries without significantly affecting recovery time. This simple addition prevents thundering herds.

Complete retry implementation

Here's a complete retry function that combines categorization, exponential backoff, jitter, and max attempts:

retry_handler.py

import time
import random
import requests
import logging

def retry_with_backoff(func, *args, max_attempts=3, base_delay=1.0, **kwargs):
    """
    Retry function with exponential backoff and jitter.
    Only retries transient failures. Honors Retry-After on 429 responses.
    """
    attempt = 0

    while attempt < max_attempts:
        try:
            result = func(*args, **kwargs)
            return result, None

        except Exception as e:
            # Categorize the error
            category, error_type, context = categorize_error(e)

            # Only retry transient errors
            if category != "transient":
                # Not retryable - return error immediately
                message = compose_error_message(category, error_type, **context)
                return None, message

            attempt += 1

            # Reached max attempts?
            if attempt >= max_attempts:
                message = compose_error_message(category, error_type,
                                              attempts=attempt, **context)
                return None, message

            # Honor Retry-After when the server provides one (429 only).
            # Otherwise, use exponential backoff with jitter.
            if error_type == "rate_limit" and context.get("retry_seconds"):
                delay = float(context["retry_seconds"])
            else:
                exponential_delay = base_delay * (2 ** (attempt - 1))
                jitter = random.uniform(0, base_delay * 0.5)
                delay = exponential_delay + jitter

            # Log for developers
            logging.warning(
                f"Attempt {attempt} failed: {error_type}. "
                f"Retrying in {delay:.1f}s..."
            )

            # Wait before retry
            time.sleep(delay)

    # Should never reach here, but handle it anyway
    return None, "Maximum retry attempts exceeded"

This function encapsulates the complete retry logic: it categorizes errors, only retries transient failures, uses exponential backoff with jitter, respects max attempts, and logs retry attempts for debugging.

Retry in action: what the user sees

From the user's perspective, retry logic makes temporary failures nearly invisible:

Terminal

Enter city name: Tokyo

Looking up coordinates for 'Tokyo'...
Connection issue. Retrying in 1.3 seconds...
Found: Tokyo, Tokyo, Japan

Fetching weather data...
Temperature: 18°C
Conditions: Partly cloudy

The network stuttered, the application waited 1.3 seconds (1s base + random jitter), retried automatically, and succeeded. The user barely noticed - just a brief pause and a helpful status message. This is production-grade error handling: failures happen, but users can still complete their tasks.

Terminal

Enter city name: Lndon

Looking up coordinates for 'Lndon'...

We couldn't find weather data for "Lndon".
Please check the spelling or try a nearby city.
Examples: London, Dublin, Manchester

No retry happened - the error was categorized as "not_found", which isn't transient. The user received immediate, actionable feedback without waiting through pointless retry attempts.

Special case: respecting rate limits

When APIs return 429 (Too Many Requests) with a Retry-After header, honor it exactly. This is different from standard exponential backoff - the API is telling you precisely when to retry:

rate_limit_handler.py

def handle_rate_limit(response):
    """Handle 429 rate limit with Retry-After header."""
    if response.status_code == 429:
        retry_after = response.headers.get('Retry-After')
        
        if retry_after:
            # Retry-After is in seconds
            wait_seconds = int(retry_after)
            logging.info(f"Rate limited. Waiting {wait_seconds}s as instructed.")
            time.sleep(wait_seconds)
            return True  # Should retry
        else:
            # No Retry-After header - fall back to exponential backoff
            return True  # Should retry with standard backoff
    
    return False  # Not a rate limit

Being a good API citizen

Honoring Retry-After headers isn't just polite; it prevents your application from being banned or throttled more aggressively. APIs rate-limit to protect their infrastructure, and respecting those limits is what tells the service you're a well-behaved client. Ignoring them risks getting your API key revoked or your IP blocked entirely, at which point the cost of the next fix is far higher than a 60-second wait.

Blocking vs non-blocking retry

time.sleep() blocks the whole process

The retry logic shown here uses time.sleep(), which halts the entire process during each wait. That's fine for CLI tools, scripts, and low-traffic applications — the only thing waiting is the one request you're retrying. In a web application handling multiple users at once, it's a problem: the process can't serve other requests while it's sleeping, so one slow retry cascades into latency for every user who happens to be on the server at the time.

For high-concurrency applications, use a non-blocking approach instead:

Async/await with asyncio. The pattern for Python web frameworks like FastAPI or aiohttp; await asyncio.sleep() yields control back to the event loop during the wait.
Task queues (Celery, RQ). Move the retry work onto a background worker, so the request handler returns immediately and the retries happen out of band.
Message queues with dead-letter queues. The durable version of the above for multi-service architectures, where a failed message is routed to a separate queue for inspection or delayed retry.

Blocking retry is the right shape for CLI tools and low-traffic applications — which is what the Weather Dashboard is. When you eventually ship a web service with concurrent users, swap time.sleep() for asyncio.sleep() (or delegate retries to a queue) and the rest of the pipeline stays the same.