10. Hosted deployment and review

Time to host the tutorial service. Uvicorn runs your FastAPI app, Railway provides the platform and database, and this page separates a successful deployment from the additional work and licences a public data product requires.

Preparing the API for a hosted tutorial deploy

The local app is feature-complete. Three small files turn it into something Railway can build, run, and reach: a requirements.txt pinning dependencies, a Procfile telling the platform how to start the server, and a startup hook that creates the tables on first boot against the managed database.

Create requirements.txt. Railway needs to know which packages to install:

requirements.txt

fastapi==0.115.6
uvicorn[standard]==0.32.1
sqlalchemy==2.0.36
psycopg2-binary==2.9.10
python-dotenv==1.0.1
requests==2.32.3

Create Procfile. Railway uses Procfile to know how to start your application:

Procfile

web: uvicorn main:app --host 0.0.0.0 --port $PORT

--host 0.0.0.0 binds to all network interfaces (required for Railway). $PORT uses Railway's assigned port.

The lifespan handler in main.py already calls init_db() on startup, so tables are created the first time Railway boots the app after its database connection variable is configured.

Deploying to Railway

Railway provides hosted deploys with PostgreSQL, automatic HTTPS, and GitHub integration for continuous deployment. Its free allowance and plan names change over time, so check the pricing page before assuming a project can run indefinitely for free.

Step 1: Create Railway Account and Project.

Visit railway.app and sign up with GitHub
Click "New Project" -> "Deploy from GitHub repo"
Authorize Railway to access your repositories
Select your News Aggregator API repository
Railway detects Python automatically and starts deployment

Step 2: Add PostgreSQL Database.

In your Railway project, click "New" -> "Database" -> "Add PostgreSQL"
Railway creates a PostgreSQL service that exposes a DATABASE_URL variable
Your API service must reference that variable in the next step

Step 3: Configure Environment Variables.

Open the Variables tab for your deployed API service, not only the PostgreSQL service
Add DATABASE_URL=${{Postgres.DATABASE_URL}} so the API can connect to Railway's PostgreSQL service. If you named that service differently, use its service name in the reference.
Add ADMIN_API_KEY with a long random string used only for key-management endpoints
Leave NEWSAPI_KEY and GUARDIAN_KEY unset for the hosted smoke test unless your provider plan explicitly permits this deployment

Do not deploy free development credentials by assumption. At the time of writing, NewsAPI's free Developer plan is for development and testing only, not staging or production deployment. The Guardian's developer access also carries usage terms you must check against your application. The Railway exercise below proves that the service starts, creates data, authenticates callers, applies rate-limit headers, and serves documentation; it does not grant permission to publish upstream news data.

Step 4: Verify Deployment. Railway provides a deployment URL like https://news-aggregator-demo.up.railway.app. Test it:

Terminal

# Test health check
curl https://your-app.up.railway.app/health

# Generate API key
curl -X POST https://your-app.up.railway.app/admin/api-keys \
  -H "X-Admin-Key: YOUR_ADMIN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"name": "Hosted Smoke Test", "tier": "basic"}'

# Test authenticated endpoint. With no licensed upstream keys configured,
# an empty article list is expected; auth and rate-limit headers still run.
curl -H "Authorization: Bearer YOUR_KEY" \
  https://your-app.up.railway.app/articles

# View interactive docs
# Visit: https://your-app.up.railway.app/docs

Your tutorial API surface is now hosted with HTTPS, a PostgreSQL database, and interactive documentation. Every push to your GitHub main branch triggers automatic redeployment; returning live upstream articles requires provider credentials whose terms permit that hosted use.

Sanity-check the hosted deploy by hitting each surface once: /health for the health check, /admin/api-keys with X-Admin-Key to prove database-backed key storage, /articles with an Authorization header to prove protected routing and rate-limit headers, and /docs in a browser for the auto-generated Swagger page. Without upstream keys, /articles should return an empty article list rather than real news; that is the expected licensed-safe smoke test.

A note on the rate limiter at this deploy

The in-memory limiter from section 7 lives in one process's RAM. Two consequences are worth knowing before you point real traffic at this URL. First, a Railway restart resets every quota; a caller who hit their limit can come straight back. Second, if you scale to more than one worker or replica, each process keeps its own counter, so a caller's effective limit becomes limit × worker_count rather than the documented number. For a personal demo or a tutorial deploy this is fine; for an API serving real traffic, swap REQUEST_LOGS for Redis (or any shared store) before you rely on the limits.

Quiz

Your API stores API keys as plain text in the database. A security audit flags this as critical risk. Why is this dangerous, and what specific attack does hashing prevent?

The danger: If your database is compromised (leaked backup, SQL injection, stolen credentials), attackers immediately have working API keys. They can impersonate legitimate users, consume quotas, exfiltrate data, and cover tracks by using real user credentials.

Attack hashing prevents: Database leaks become useless. Attackers see hashes like a3f5b..., not keys like xQz4K8m_NpD1.... They can't use hashes for API requests. Computing the original key from a SHA-256 hash is computationally infeasible; it would take millions of years with current technology.

The pattern: treat API keys like passwords. Hash with SHA-256 (or bcrypt for slower verification, which makes brute-forcing leaked hashes slower too). Store only the hash. Show the raw key once at generation and never again. Stripe, GitHub, and AWS all work this way; if a user loses the key, they rotate to a new one rather than recovering the old one.

A user hits your rate limit (100 requests/hour) at 15:43. They wait 5 minutes and try again at 15:48, still getting 429 errors. They're confused; they waited but are still rate-limited. Explain what's happening and when they can make requests again.

What's happening: Fixed-window rate limiting resets at hourly boundaries (15:00, 16:00, 17:00), not relative to first request. When they hit 100 requests at 15:43, they're rate-limited until 16:00, the next hour boundary. Waiting 5 minutes (until 15:48) doesn't help because they're still in the 15:00-16:00 window.

Why this design: Fixed windows are simple to implement and reason about. "Your quota is 100 requests per hour starting each hour at :00" is clear. Calculating resets is trivial (truncate to hour boundary, add 1 hour). No complex sliding window math.

The fix from the server side: the 429 should include both the reset boundary and the seconds-until-reset, e.g. {"detail": "Rate limit exceeded. Resets at 16:00 (in 720 seconds)", "reset_at": "2026-05-25T16:00:00+00:00"}. A confused user reads "Resets at 16:00"; a confused script parses reset_at.

Your API generates API keys as sequential integers: 1, 2, 3, 4... A security audit flags this as critical. What's wrong with sequential IDs for API keys, and what makes a good API key?

The problem: Sequential IDs are predictable. If I get key "1234", I can guess "1235" and "1236" exist. Automated scanning tries all possible values. With sequential IDs, there are only N keys to try (where N is total users). An attacker can enumerate all valid keys systematically.

What makes good API keys: Cryptographically secure randomness (256 bits minimum), no patterns or predictability, URL-safe characters only. Using secrets.token_urlsafe(32) generates 43-character strings with 2^256 possible values; enumeration is computationally infeasible.

Attack scenario with sequential keys: Attacker registers, gets key "5000". Script tries 4999, 5001, 5002... Each try has high success probability. Script finds 100 valid keys in minutes. Attacker can impersonate any user.

With random keys: Each key is 43 random characters. Trying random strings has 1 in 2^256 chance of success. Would take billions of years to guess one valid key.

You deploy your API with PostgreSQL connection pool size=5. Under load, you get "connection pool exhausted" errors even though PostgreSQL has max_connections=100. What's the relationship between pool size and application concurrency, and how should you size your pool?

The problem: Each request needs a database connection. With 5 connections in the pool and 10 concurrent requests, 5 requests get connections and 5 wait. If those 5 requests are slow (1+ seconds), the waiting requests time out.

Pool sizing formula (Little's Law): pool_size ≥ arrival_rate × average_request_duration. If you handle 20 requests/second and each request takes 100ms, you need ~2 concurrent connections (20 × 0.1). If requests take 1 second instead, you need ~20 (20 × 1). The pool sizes the steady-state in-flight work, not the per-second throughput.

Multi-service consideration: If you run 4 API servers, each with pool_size=10 and max_overflow=5, the worst-case footprint is 4 × (10 + 5) = 60 connections. That total must stay below the database's max_connections. Formula: servers × (pool_size + max_overflow) ≤ database_max_connections.

Hosted PostgreSQL plans: managed databases usually have lower connection limits than your local PostgreSQL install. The chapter's database.py uses pool_size=5, max_overflow=10, which bursts to 15 under load and leaves headroom for database tools and migrations on a small single-worker deploy.

Production approach: Monitor connection usage. If pool is frequently exhausted, either increase pool_size or optimize queries to release connections faster. Long-running queries should not hold connections, use async processing instead.

Your API caches articles for 1 hour. A breaking news event happens, but your API serves stale cached data for 45 more minutes until cache expires. How would you implement cache invalidation for breaking news while maintaining caching benefits for regular content?

The problem: Fixed TTL caching doesn't adapt to importance. Breaking news deserves immediate updates. Regular articles can stay cached longer.

Solution 1: Category-based TTL

# Shorter cache for urgent categories
CACHE_DURATIONS = {
    "breaking": 5,      # 5 minutes
    "politics": 15,     # 15 minutes
    "technology": 60,   # 1 hour
    "entertainment": 120 # 2 hours
}

def get_cache_duration(category: str) -> int:
    return CACHE_DURATIONS.get(category, 60)

Solution 2: Manual invalidation endpoint Admin endpoint to purge cache for specific categories or sources. When breaking news hits, admin calls POST /admin/cache/invalidate?category=politics to force refresh.

Solution 3: Smart refresh triggers Monitor external APIs for article count changes. If NewsAPI suddenly has 50 new articles in "politics" category (normally 5-10), invalidate cache and refetch immediately.

Trade-offs: More complex caching logic increases maintenance burden. Balance freshness requirements with implementation complexity. For most APIs, category-based TTL provides good balance.

Why build a unified news aggregator API instead of having clients call NewsAPI and the Guardian directly? What specific problems does aggregation solve?

Problems aggregation solves:

Response normalisation: NewsAPI uses publishedAt, Guardian uses webPublicationDate. Clients would need format-specific parsing for each source. Aggregator provides one consistent format.
Authentication complexity: Each API has different auth methods. NewsAPI uses query parameter keys, Guardian uses different parameter names. Aggregator presents one authentication scheme.
Rate limit management: Clients hitting both APIs directly consume quotas twice as fast. Aggregator caches intelligently, reducing external API calls by 80-90%.
Failure handling: If NewsAPI is down, clients must implement fallback logic. Aggregator handles this transparently, trying alternate sources automatically.
Cost optimization: Multiple clients calling external APIs directly multiplies API costs. One aggregator serving many clients reduces external API usage dramatically.

Real-world example: This is how Stripe aggregates payment processors, how travel sites aggregate airlines, how Google News aggregates thousands of publishers. Aggregation provides value through simplification, reliability, and cost reduction.

In your test suite, why is mocking external APIs (NewsAPI, Guardian) critical rather than just optional good practice? What specific problems occur without mocking?

Problems without mocking:

Tests become slow: Network calls take 200-500ms each. Test suite with 50 API calls takes 10-25 seconds instead of under 1 second. Slow tests don't get run frequently.
Tests become flaky: External APIs have downtime, rate limiting, and network failures. Tests fail randomly when APIs are unavailable, even though your code is correct.
Tests consume quota: Running tests 20 times per day against real APIs burns through free tier limits. You hit rate limits and can't run tests.
Can't test error handling: How do you test your "external API is down" error handling if you're calling the real API that's currently up? Mocks let you simulate failures.
Tests lack isolation: External API changes break your tests even though your code didn't change. Tests should only fail when YOUR code is broken.

With mocks: the suite runs in under a second, doesn't fail on network blips, doesn't consume external quotas, and can exercise every error path on demand. Tests only break when your code breaks.

Looking ahead

Three chapters ago you were calling someone else's /articles endpoint; now you've shipped one. The pieces this chapter put together (RESTful URLs, FastAPI routes with Pydantic models, SQLAlchemy sessions injected per request, hashed API keys verified by a dependency, fixed-window rate limits, multi-source fan-out with a cache in front, a pytest suite that doesn't hit the network) recombine in any HTTP API you'll build later. The framework changes; the shape doesn't.

What's still missing is everything past "it runs on a small hosted deploy". Chapter 27 packages the API in a Docker image so the runtime is the same on a laptop, in CI, and in production. Chapter 28 takes that image to AWS: ECS Fargate behind an ALB, RDS for the database, an ECR registry for the images. Chapter 29 fills in the operations layer: a GitHub Actions pipeline that tests and deploys on push, CloudWatch dashboards and alarms, auto-scaling policies driven by CPU and request rate. Chapter 30 pulls all of it back together into the capstone.

Next, in Chapter 27, we containerise this same API and rebuild the local dev story around Docker Compose so the database, cache, and app run as one composable unit.