5. Database integration

FastAPI's dependency-injection system makes database sessions clean. SQLAlchemy gives you typed models. Together they let you write CRUD endpoints that read and write to a real database without ceremony.

Why PostgreSQL over SQLite

Earlier chapters used SQLite for the Music Time Machine: one file, one process, zero ceremony. That model breaks when an HTTP API has multiple workers fielding concurrent requests against the same data. Four differences push you to PostgreSQL once you cross that line:

  • Concurrent writes. SQLite locks the entire database file during writes. Only one writer at a time. If your API has 10 concurrent requests trying to write, they queue sequentially. PostgreSQL handles thousands of concurrent writes through its sophisticated locking system. Each table row can be locked independently.
  • Connection pooling. PostgreSQL runs as a separate server process. Multiple API servers connect to one PostgreSQL instance. SQLite is a file that each process opens directly. You can't share an SQLite database between multiple API servers without complex file locking that doesn't work reliably.
  • Data integrity. PostgreSQL enforces foreign key constraints, check constraints, and complex transaction isolation levels. SQLite supports these but with limitations. For production data that multiple services depend on, PostgreSQL's robustness matters.
  • Scalability. PostgreSQL can grow to terabytes with replication, read replicas, and horizontal scaling. SQLite is designed for embedded use, one file on one machine. When your API outgrows one server, PostgreSQL scales. SQLite doesn't.

SQLite remains an excellent embedded database, and WAL mode can serve modest concurrent applications well. For this API, PostgreSQL is the practical next step because it is built for concurrent server workloads, connection pooling, managed hosting, and later scaling. The Music Time Machine benefits from SQLite's simplicity; the News Aggregator API benefits from PostgreSQL's operating model.

PostgreSQL setup (local development)

For local development, install PostgreSQL and create a database for the News Aggregator API. In the hosted tutorial deploy in section 10, you'll use Railway's managed PostgreSQL.

Terminal
# macOS (Homebrew)
brew install postgresql@15
brew services start postgresql@15

# Ubuntu/Debian
sudo apt-get update
sudo apt-get install postgresql postgresql-contrib

# Create database and user
psql postgres -c "CREATE USER api_user WITH PASSWORD 'dev_password';"
createdb news_aggregator --owner=api_user
psql news_aggregator -c "GRANT USAGE, CREATE ON SCHEMA public TO api_user;"

For Windows: Download PostgreSQL installer from postgresql.org, run it, and use pgAdmin to create the database and user.

If you already have Docker, you can instead start a local PostgreSQL database in one command. This creates the user and database used by the .env value below:

Terminal
docker run --name news-aggregator-postgres \
  -e POSTGRES_USER=api_user \
  -e POSTGRES_PASSWORD=dev_password \
  -e POSTGRES_DB=news_aggregator \
  -p 5432:5432 \
  -d postgres:15

Create a .env file to store your database connection string:

.env
DATABASE_URL=postgresql://api_user:dev_password@localhost:5432/news_aggregator
NEWSAPI_KEY=your-newsapi-key
GUARDIAN_KEY=your-guardian-key
ADMIN_API_KEY=dev-admin-key-change-me

Important: Add .env to .gitignore. Never commit secrets to version control.

Defining models with SQLAlchemy

SQLAlchemy lets you define tables as Python classes and translates Python expressions into SQL on the way to the database. The next file is where the API's two persistent tables live: articles (the cache) and api_keys (auth). It also defines the connection pool and the get_db dependency the routes use to obtain a session per request.

Create database.py:

database.py
import os
from sqlalchemy import create_engine, Column, Integer, String, DateTime, Boolean
from sqlalchemy.orm import declarative_base, sessionmaker
from datetime import datetime, timezone
from dotenv import load_dotenv

load_dotenv()

# Database connection string from environment
DATABASE_URL = os.getenv("DATABASE_URL")
if not DATABASE_URL:
    raise RuntimeError("DATABASE_URL environment variable is required")

# Create engine with connection pooling
engine_kwargs = {"pool_pre_ping": True}
if not DATABASE_URL.startswith("sqlite"):
    engine_kwargs.update(
        pool_size=5,      # 5 persistent connections
        max_overflow=10   # Up to 10 additional connections under load
    )

engine = create_engine(DATABASE_URL, **engine_kwargs)

# Session factory for database operations
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)

# Base class for all models
Base = declarative_base()


# Database Models
class Article(Base):
    """Cached news articles."""
    __tablename__ = "articles"
    
    id = Column(Integer, primary_key=True, index=True)
    title = Column(String(500), nullable=False)
    description = Column(String(2000))
    url = Column(String(500), unique=True, nullable=False)
    source = Column(String(100), nullable=False)
    category = Column(String(100), nullable=False)
    published_at = Column(DateTime, nullable=False)
    created_at = Column(DateTime, default=lambda: datetime.now(timezone.utc))
    updated_at = Column(
        DateTime,
        default=lambda: datetime.now(timezone.utc),
        onupdate=lambda: datetime.now(timezone.utc),
    )


class APIKey(Base):
    """API keys for authentication."""
    __tablename__ = "api_keys"
    
    id = Column(Integer, primary_key=True, index=True)
    key_hash = Column(String(64), unique=True, nullable=False, index=True)
    name = Column(String(200))
    rate_limit_tier = Column(String(50), default="basic")
    is_active = Column(Boolean, default=True)
    created_at = Column(DateTime, default=lambda: datetime.now(timezone.utc))
    last_used_at = Column(DateTime, nullable=True)


# Dependency injection for database sessions
def get_db():
    """Get database session for request handlers."""
    db = SessionLocal()
    try:
        yield db
    finally:
        db.close()


# Create all tables
def init_db():
    """Initialize database tables."""
    Base.metadata.create_all(bind=engine)

The engine configuration creates a small connection pool: five persistent connections, up to ten temporary overflow connections during spikes, and a pre-ping check before SQLAlchemy hands a connection to your code. If PostgreSQL restarted or a connection went stale, SQLAlchemy detects that before the request uses it.

The Article model stores cached news articles. The unique url prevents duplicate rows, while updated_at gives the cache layer a freshness timestamp it can query later.

The APIKey model stores hashed API keys. key_hash is indexed for fast lookups, rate_limit_tier determines the request budget, and is_active lets you revoke a key without deleting its record.

The get_db() dependency opens a session, yields it to the route, and closes it in finally. That pattern makes cleanup reliable even when a route raises an exception.

Initialize the database tables:

init_db.py
from database import init_db

if __name__ == "__main__":
    print("Creating database tables...")
    init_db()
    print("Tables created successfully!")
Terminal
python init_db.py
# Output: Creating database tables...
# Output: Tables created successfully!

Verify the tables were created:

Terminal
psql news_aggregator -c "\dt"
#           List of relations
#  Schema |    Name     | Type  |   Owner   
# --------+-------------+-------+-----------
#  public | api_keys    | table | api_user
#  public | articles    | table | api_user

Wiring the database into FastAPI endpoints

Now update main.py to use the PostgreSQL database instead of the in-memory dictionary.

main.py (updated)
from fastapi import FastAPI, HTTPException, Depends, Query
from sqlalchemy.orm import Session
from pydantic import BaseModel
from datetime import datetime

from database import get_db, Article as DBArticle

app = FastAPI(
    title="News Aggregator API",
    description="Unified interface for multiple news sources",
    version="1.0.0"
)


# Pydantic models for API
class ArticleResponse(BaseModel):
    """Schema for a single article."""
    id: int
    title: str
    description: str
    url: str
    source: str
    category: str
    published_at: datetime
    created_at: datetime

    class Config:
        from_attributes = True  # Allow conversion from SQLAlchemy models


class ArticleListResponse(BaseModel):
    """Schema for list-articles responses."""
    articles: list[ArticleResponse]
    count: int


@app.get("/articles", response_model=ArticleListResponse)
def list_articles(
    category: str | None = None,
    source: str | None = None,
    limit: int = Query(20, ge=1, le=100),
    db: Session = Depends(get_db)
):
    # Build query
    query = db.query(DBArticle)

    # Apply filters
    if category:
        query = query.filter(DBArticle.category == category)
    if source:
        query = query.filter(DBArticle.source == source)

    # Apply limit and execute
    articles = query.limit(limit).all()

    return {"articles": articles, "count": len(articles)}


@app.get("/articles/{article_id}", response_model=ArticleResponse)
def get_article(article_id: int, db: Session = Depends(get_db)):
    article = db.query(DBArticle).filter(DBArticle.id == article_id).first()

    if not article:
        raise HTTPException(status_code=404, detail="Article not found")

    return article

The main.py rewrite replaces the in-memory dictionary with the database. The GET endpoint shapes from section 4 carry over; the in-memory POST /articles handler doesn't, because the rest of the chapter is a read-only aggregator over the cached upstream sources, not a CRUD demo. Delete create_article and its ArticleCreate model from your main.py. Article as DBArticle aliases the SQLAlchemy model on import to keep it distinct from the Pydantic ArticleResponse; mixing the two names up is the first bug you hit once both files grow. ArticleResponse sets from_attributes = True so Pydantic reads fields straight off the SQLAlchemy object; the single-article endpoint takes advantage of that and simply return articles. The list endpoint wraps its results in an ArticleListResponse so the caller always gets a stable {"articles": [...], "count": N} shape that the rest of the chapter extends with more fields (rate-limit headers in section 7, a cache_status field in section 8).

The query work happens in three short steps. db: Session = Depends(get_db) at the endpoint signature pulls a pooled session for the request and returns it on the way out. db.query(DBArticle) starts a query, the conditional .filter(...) calls add WHERE clauses for whichever query parameters were provided, and .limit(limit).all() executes. The single-article endpoint uses .first() instead, which returns the row or None; that's cheaper than .all()[0] because the LIMIT lands in the SQL rather than in Python.

Test the database-backed endpoints. They work identically to the in-memory version, but data persists across server restarts.

Why connection pooling matters

Opening a fresh PostgreSQL connection per request is expensive: a TCP handshake, authentication, a backend process spinning up on the server. Under any meaningful traffic the API would spend most of its time connecting instead of querying. The pool keeps a set of open connections; each request borrows one for the duration of its work and returns it on the way out.

The settings in database.py (pool_size=5, max_overflow=10, pool_pre_ping=True) mean five connections stay open at idle, the pool can grow to fifteen under load, and SQLAlchemy verifies a connection is still alive before handing it over. A database restart should not surface as a stream of broken-connection errors.

The numbers come from arithmetic, not folklore. PostgreSQL's default max_connections is 100, and hosted plans often cap connections lower than a local install. The rule of thumb is (workers × (pool_size + max_overflow)) < database_max_connections; the defaults here keep a single-worker deploy modest and scale fine on a dedicated instance.

Next, in section 6, we add API keys: generate them with secrets.token_urlsafe, store SHA-256 hashes in api_keys, and gate every route behind a require_api_key dependency that turns a missing or wrong header into a 401.