2. Performance profiling before containerisation

Before you reach for Redis or scale horizontally, profile. Knowing which endpoints are slow tells you where caching pays off and where it doesn't.

Before wrapping the API in containers, we need to know where the time goes. The next sections add a Redis cache; without a baseline, we won't be able to tell whether the cache helped, hurt, or moved the needle a few percent. Containers don't fix slow code. They package it as-is, then run it on infrastructure billed by the hour.

This page adds two things: timing middleware that records how long each request spent inside the app, and a small load test that hits the API hard enough to expose what one user can't. The output of both is a single number per endpoint that we'll compare against after caching is in place.

Understanding the API's performance baseline

The first step is measuring how long requests take. Your News API fetches articles from NewsAPI and Guardian, saves them to PostgreSQL, and returns results. How long does this take? Without measurement, you're guessing. Add timing middleware that logs request duration for every endpoint.

Make: Add timing middleware to your News API. Create a new file called middleware/timing.py in your project:

middleware/timing.py

import time
from fastapi import Request
from starlette.middleware.base import BaseHTTPMiddleware


class TimingMiddleware(BaseHTTPMiddleware):
    """Middleware that logs request processing time."""
    
    async def dispatch(self, request: Request, call_next):
        # Record start time
        start_time = time.time()
        
        # Process the request
        response = await call_next(request)
        
        # Calculate duration
        duration = time.time() - start_time
        duration_ms = duration * 1000  # Convert to milliseconds
        
        # Log the timing
        print(f"{request.method} {request.url.path} - {duration_ms:.2f}ms")
        
        # Add timing header to response
        response.headers["X-Process-Time"] = f"{duration_ms:.2f}ms"
        
        return response

Now register this middleware in your main.py. Add these lines after creating your FastAPI app:

main.py (register timing middleware)

from middleware.timing import TimingMiddleware

app = FastAPI(title="News Aggregator API")

# Add timing middleware
app.add_middleware(TimingMiddleware)

Check: Start your News API and make requests to the /articles endpoint. Watch your terminal output. Your timings will vary with your network, laptop, and API responses, but the shape should look like this:

Terminal (example output)

$ uvicorn main:app --reload
INFO:     Started server process
INFO:     Waiting for application startup.
INFO:     Application startup complete.

GET /articles?category=technology - 687.23ms
GET /articles?category=business - 742.15ms
GET /articles - 821.47ms
GET /articles?source=newsapi - 694.82ms

In this worked example, every request takes 650-850ms. That's slow. Users expect responses under 200ms for good experiences. Let's understand why this path takes so long.

The middleware wraps every request with timing logic. start_time = time.time() captures when the request begins processing. The application handles the request (fetching from external APIs, querying the database, building the response). duration = time.time() - start_time calculates how long everything took.

Multiplying by 1000 converts seconds to milliseconds because response times are conventionally reported in milliseconds. The timing also appears in the response headers as X-Process-Time, allowing clients to monitor API performance programmatically.

Your numbers are the baseline we'll compare against after Redis is in place. Write them down. The order-of-magnitude gap between "fetched from NewsAPI" and "returned from cache" is the whole reason this chapter exists, and without the before-number the after-number means nothing.

Identifying the bottleneck

Your /articles endpoint performs multiple operations. Each operation takes time. To optimise effectively, break down where the request spends its time.

External API calls. Your endpoint fetches from NewsAPI and Guardian. A typical call may spend a few hundred milliseconds on network round-trip, API processing, and response parsing. Two sources means most of the request can disappear into upstream waiting time.
Database operations. After fetching articles, your code saves them to PostgreSQL (checking for duplicates, inserting new records). Then it queries the database to return results. Database operations add their own cost, especially as the number of articles grows.
Application processing. Normalising different API response formats, building your standardised response, and serialising to JSON adds the final slice.

In the worked example, that roughly means: external APIs dominate, database work comes next, and local processing is the smallest slice. The important conclusion is not the exact arithmetic; it is that repeated upstream calls dominate request time.

The problem multiplies with concurrent users. If 100 users request articles simultaneously, your API makes 200 external API calls (100 users × 2 sources). External APIs rate-limit this kind of repeated traffic, and free-tier quotas disappear quickly when many users trigger identical upstream requests.

The solution is caching. The first request still pays the external-API and database cost. Subsequent requests for the same category within 5 minutes return cached data from memory. Instead of 200 external API calls for 100 concurrent users, you make 2 calls (one per source) and serve 98 requests from cache.

This pattern is universal in API development: expensive operations (external API calls, complex database queries, heavy computations) are cached so repeated requests serve from memory instead of repeating the expensive work. You'll implement this caching strategy in Section 6 using Redis.

Load-testing locally

Measuring one request at a time shows average performance. Load testing simulates multiple concurrent users, revealing how your API behaves under realistic traffic. You need a load testing tool that sends many requests quickly.

Make: Install hey, a simple load testing tool. On macOS, use Homebrew:

Terminal

brew install hey

On Linux or Windows, download the binary from the hey GitHub releases page. Verify installation:

Terminal

$ hey --version
hey version 0.1.4

Check: Run a load test against your unoptimised API. Make sure your API is running (uvicorn main:app), swap YOUR_KEY for an API key you created in Chapter 26, then execute:

Terminal

hey -n 100 -c 10 -H "Authorization: Bearer YOUR_KEY" \
  "http://localhost:8000/articles?category=technology"

This command sends 100 total requests (-n 100) with 10 concurrent requests at a time (-c 10). The output gives you detailed performance statistics; here is the worked-example baseline we'll compare against later:

Terminal (example output)

Summary:
  Total:        8.2341 secs
  Slowest:      0.9523 secs
  Fastest:      0.6432 secs
  Average:      0.7234 secs
  Requests/sec: 12.15

Response time histogram:
  0.643 [1]     |
  0.674 [8]     |■■■■
  0.705 [32]    |■■■■■■■■■■■■■■■■
  0.736 [28]    |■■■■■■■■■■■■■■
  0.767 [15]    |■■■■■■■
  0.798 [9]     |■■■■
  0.829 [4]     |■■
  0.860 [2]     |■
  0.891 [0]     |
  0.922 [0]     |
  0.952 [1]     |

Latency distribution:
  10% in 0.6789 secs
  25% in 0.6921 secs
  50% in 0.7123 secs
  75% in 0.7456 secs
  90% in 0.7892 secs
  95% in 0.8234 secs
  99% in 0.9523 secs

Extract: Document your own baseline metrics. In this worked example, the uncached API handles approximately 12 requests per second with an average response time of 723ms. The 95th percentile (slowest 5% of requests) takes 823ms. Your numbers establish your optimisation target.

After Redis lands in section 6, we'll run this exact command again. The cached path doesn't touch NewsAPI or the Guardian and doesn't go near the article-normalisation code; it's a single Redis GET that returns the serialised payload from memory. Roughly two orders of magnitude faster than the baseline above, but the point is the measurement, not the multiplier.

Checkpoint quiz

Use this quiz to check your understanding. Try to answer each question out loud or in a notebook before expanding the explanation. If you get stuck, that's a signal to revisit the relevant section.

Select each question to reveal a detailed answer:

Why profile performance before containerising?

Answer: Containers package the code as-is. Containerise a slow endpoint and you've shipped a slow endpoint that's harder to fix: every iteration now means rebuild, push, redeploy rather than a five-line edit and a server restart. Profile first, fix what's worth fixing, then containerise the version you'd actually deploy.

Profiling reveals whether slowness is from your code (fixable) or infrastructure limits (scale horizontally). Adding servers without profiling can turn a small code fix into an avoidable infrastructure bill.

What's the main bottleneck in the unoptimised News API?

Answer: Repeated external API calls and database queries. Every request to /articles fetches from NewsAPI and Guardian, then queries the database, even when 100 users request identical data within seconds. This redundancy causes:

400ms+ waiting on external APIs per request
Rate limit exhaustion (external APIs block you)
Database load spikes with concurrent users

Caching solves this by storing the result once and serving it from memory (Redis) for subsequent requests.

Next, in section 3, we step back from the API and look at what a container actually is, how it differs from a VM, and what an image is versus a running container, so the rest of the chapter has a working mental model before we touch a Dockerfile.