8. PostgreSQL features SQLite can't match

Migration isn't just about matching what SQLite did. It's also about getting access to things SQLite cannot do at all. Three of those things matter most in real backends: JSONB for storing API responses, full-text search with stemming and ranking, and connection pooling so your app survives real traffic.

Migration isn't just about matching SQLite's functionality. PostgreSQL offers features that improve your application beyond what SQLite could do. This section demonstrates three powerful features you'll use in production applications.

JSONB columns for API data

API responses often contain nested JSON with varying structures. Storing raw JSON in TEXT columns works, but you can't query the data efficiently. PostgreSQL's JSONB type stores JSON with full indexing and querying capabilities.

Imagine our weather cache receives rich API responses with nested data: temperature, humidity, wind speed, pressure, and forecast arrays. Instead of creating columns for every possible field (and updating the schema when the API changes), store the full response as JSONB. The script reuses the get_connection() helper you wrote in weather_app.py in the previous section. Save the following as weather_cache.py at the project root:

weather_cache.py
import psycopg2
import psycopg2.extras
import json
from datetime import datetime, timezone

from weather_app import get_connection

def create_enhanced_schema():
    conn = get_connection()
    try:
        with conn.cursor() as cursor:
            cursor.execute("""
                CREATE TABLE IF NOT EXISTS weather_cache (
                    id SERIAL PRIMARY KEY,
                    location VARCHAR(100) NOT NULL,
                    api_response JSONB NOT NULL,
                    timestamp TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
                )
            """)

            # Index on JSON field for fast queries
            cursor.execute("""
                CREATE INDEX IF NOT EXISTS idx_weather_temp
                ON weather_cache
                ((api_response->'main'->>'temp'))
            """)

        conn.commit()
    finally:
        conn.close()

def save_weather_response(location, api_data):
    """Store entire API response as JSONB"""
    conn = get_connection()
    try:
        with conn.cursor() as cursor:
            cursor.execute("""
                INSERT INTO weather_cache (location, api_response)
                VALUES (%s, %s)
            """, (location, json.dumps(api_data)))
        conn.commit()
    finally:
        conn.close()

def find_windy_days(min_wind_speed=20):
    """Query nested JSON data directly in SQL"""
    conn = get_connection()
    try:
        with conn.cursor() as cursor:
            cursor.execute("""
                SELECT
                    location,
                    api_response->'main'->>'temp' as temperature,
                    api_response->'wind'->>'speed' as wind_speed,
                    timestamp
                FROM weather_cache
                WHERE (api_response->'wind'->>'speed')::numeric > %s
                ORDER BY timestamp DESC
            """, (min_wind_speed,))
            return cursor.fetchall()
    finally:
        conn.close()

# Example usage
api_response = {
    "main": {"temp": 15.5, "humidity": 72, "pressure": 1013},
    "wind": {"speed": 25, "deg": 180},
    "weather": [{"description": "windy"}]
}

create_enhanced_schema()  # Create the weather_cache table before using it
save_weather_response('Dublin', api_response)

windy_days = find_windy_days(min_wind_speed=20)
for location, temp, wind, timestamp in windy_days:
    print(f"{location}: {temp}\u00b0C, wind {wind} km/h at {timestamp}")

JSONB advantages

Five reasons JSONB earns its place in a PostgreSQL-backed app the moment external APIs enter the picture:

  • Flexible schema. Store any JSON structure without defining columns upfront. API changes don't require schema migrations.
  • Queryable. Use -> to access nested fields directly in SQL. No need to load the entire JSON into Python first.
  • Indexable. Create indexes on specific JSON paths for fast queries.
  • Binary storage. JSONB (not JSON) stores data in binary format for faster processing than text parsing.
  • Validation. PostgreSQL ensures stored data is valid JSON.

Chapter 25 picks this back up at scale: Spotify's track objects have 20+ fields (album data, external identifiers, release dates, and more). Rather than creating columns for each field, you'll store the full track payload as JSONB and query specific fields when needed.

Full-text search

SQLite's LIKE '%keyword%' searches work for simple cases, but they're slow and primitive. PostgreSQL offers full-text search with ranking, stemming (finding "running" when searching "run"), and stopword filtering.

Imagine users want to search weather conditions: "heavy rain", "light snow", "thunderstorm". Full-text search finds relevant results even with variations in wording. Like the JSONB script, this one imports get_connection() from weather_app.py. Save the following as weather_search.py at the project root:

weather_search.py
from weather_app import get_connection

def add_fulltext_search():
    """Add tsvector column for full-text search"""
    conn = get_connection()
    try:
        with conn.cursor() as cursor:
            # Add tsvector column
            cursor.execute("""
                ALTER TABLE weather_history
                ADD COLUMN IF NOT EXISTS search_vector tsvector
            """)

            # Create trigger to auto-update search index
            cursor.execute("""
                CREATE OR REPLACE FUNCTION update_search_vector()
                RETURNS trigger AS $$
                BEGIN
                    NEW.search_vector :=
                        to_tsvector('english',
                                    COALESCE(NEW.location, '') || ' ' ||
                                    COALESCE(NEW.conditions, ''));
                    RETURN NEW;
                END
                $$ LANGUAGE plpgsql
            """)

            cursor.execute("""
                CREATE TRIGGER weather_search_update
                BEFORE INSERT OR UPDATE ON weather_history
                FOR EACH ROW EXECUTE FUNCTION update_search_vector()
            """)

            # Update existing rows
            cursor.execute("""
                UPDATE weather_history
                SET search_vector = to_tsvector('english',
                                                location || ' ' ||
                                                COALESCE(conditions, ''))
            """)

            # Create GIN index for fast search
            cursor.execute("""
                CREATE INDEX IF NOT EXISTS idx_search_vector
                ON weather_history USING GIN(search_vector)
            """)

        conn.commit()
        print("Full-text search configured")
    finally:
        conn.close()

def search_weather(query):
    """Search weather descriptions with ranking"""
    conn = get_connection()
    try:
        with conn.cursor() as cursor:
            cursor.execute("""
                SELECT
                    location,
                    conditions,
                    timestamp,
                    ts_rank(search_vector, query) AS rank
                FROM weather_history,
                     to_tsquery('english', %s) query
                WHERE search_vector @@ query
                ORDER BY rank DESC, timestamp DESC
                LIMIT 10
            """, (query,))
            return cursor.fetchall()
    finally:
        conn.close()

# Example usage
add_fulltext_search()  # One-time setup: add the search_vector column, trigger, and index
results = search_weather('heavy & rain')  # '&' means AND
for location, conditions, timestamp, rank in results:
    print(f"{location}: {conditions} (relevance: {rank:.3f})")

Run it and you'll see something like:

Terminal
Dublin: Heavy rain with thunderstorms (relevance: 0.607)
London: Heavy rain (relevance: 0.455)
Cork: Light rain becoming heavy (relevance: 0.303)
Paris: Overcast with occasional rain (relevance: 0.152)

Full-text search features

Five reasons full-text search beats LIKE '%...%' for anything user-facing:

  • Stemming. Searching "rain" finds "raining", "rained", and "rainy".
  • Ranking. Results ordered by relevance, not just timestamp.
  • Boolean operators. Use & (AND), | (OR), ! (NOT) for complex queries.
  • Performance. GIN indexes make searches fast even with millions of rows.
  • Auto-update. Triggers keep the search index synchronised automatically.

Chapter 25 uses the same machinery on track metadata: full-text search across track names, artists, and album titles with ranking and stemming. A search for "acoustic guitar" finds tracks tagged "acoustic" or "guitar" in any field.

Connection pooling

Opening a database connection is expensive: establishing the network connection, authentication, allocating server resources. If your web application opens and closes connections for every request, you waste time and strain the database server.

Connection pooling solves this by maintaining a pool of reusable connections. When your application needs a database connection, it borrows one from the pool. When done, it returns the connection to the pool instead of closing it. This is dramatically faster for high-traffic applications. Save the following as weather_pool.py at the project root:

weather_pool.py
from psycopg2 import pool
import os

# Create connection pool once at application startup
connection_pool = None

def initialize_pool():
    """Create connection pool - call this once when app starts"""
    global connection_pool
    connection_pool = pool.ThreadedConnectionPool(
        minconn=2,          # Minimum connections to keep open
        maxconn=20,         # Maximum connections allowed
        host=os.getenv('DB_HOST', 'localhost'),
        database=os.getenv('DB_NAME', 'weather_db'),
        user=os.getenv('DB_USER'),
        password=os.getenv('DB_PASSWORD', '')
    )
    print("Connection pool initialized: 2-20 connections")

def get_connection():
    """Get connection from pool"""
    if connection_pool is None:
        raise Exception("Connection pool not initialized")
    return connection_pool.getconn()

def return_connection(conn):
    """Return connection to pool (don't close it!)"""
    if connection_pool is not None:
        connection_pool.putconn(conn)

def close_pool():
    """Close all connections - call on application shutdown"""
    global connection_pool
    if connection_pool is not None:
        connection_pool.closeall()
        connection_pool = None
        print("Connection pool closed")

# Usage in application code
def save_weather(location, temperature, conditions):
    """Use pooled connection"""
    conn = get_connection()
    try:
        with conn.cursor() as cursor:
            cursor.execute("""
                INSERT INTO weather_history
                (location, temperature, conditions, timestamp)
                VALUES (%s, %s, %s, CURRENT_TIMESTAMP)
            """, (location, temperature, conditions))
        conn.commit()
    except Exception as e:
        conn.rollback()
        raise
    finally:
        return_connection(conn)  # Return to pool, don't close

# Initialize pool at startup
initialize_pool()

# Use in requests
save_weather('Dublin', 12.5, 'Cloudy')
save_weather('London', 15.0, 'Rainy')

# Close pool at shutdown
close_pool()

One caution: this get_connection() is a standalone illustration of the pooling pattern, not a drop-in replacement for the weather_app.get_connection() that weather_cache.py and weather_search.py import earlier in this section. Those scripts close() the connection when they finish; a pooled connection must be handed back with return_connection() instead. If you adopt pooling for real, switch every caller over to return_connection() at the same time rather than mixing the two helpers.

When to use connection pooling

Pooling is essential in some contexts and a needless complication in others. The line is roughly: pool when many requests share the database, skip it when the script just runs once.

  • Web applications. Essential for Flask, Django, and FastAPI applications where each request needs database access. Without pooling, you create and destroy connections hundreds of times per second.
  • Multi-threaded applications. When multiple threads access the database simultaneously, pooling prevents connection exhaustion.
  • Microservices. Services that handle bursts of traffic benefit from maintaining warm connections.
  • Don't use for. Simple scripts that run once and exit, CLI tools with infrequent database access, or single-threaded applications with low traffic.

Chapter 25 puts this to work on the Music Time Machine dashboard, where multiple workers handle requests simultaneously. Tuning minconn and maxconn for real traffic is part of that chapter's production deployment work.

You now have JSONB, full-text search, and connection pooling. The last piece is making schema changes safe over time: when you add a column or change a constraint six months from now, you'll want a version-controlled trail of every change. That's what Alembic gives you, and it's where Section 9 takes you next.