Performance and production habits

SQLite is fast enough for most personal projects without any tuning at all. The work starts when a database stops being a tiny convenience file and becomes something your application depends on every day: queries need to stay quick, imports need to finish in seconds instead of minutes, backups need to be trustworthy, and connection handling needs to be boringly correct. This page is about those habits.

The production habits in this page fall into four groups: keep repeated reads fast, make bulk writes efficient, protect the data on disk, and recognise when SQLite is no longer the right operational shape.

Keep reads fast with earned indexes

Indexes were introduced earlier. The principle worth restating: index columns you frequently filter on (WHERE column = ?) or sort by (ORDER BY column); leave the rest alone. Each index speeds up reads but slows down writes a little (the database has to keep the index sorted as new rows arrive) and adds storage.

For the weather cache, the index choices come from the queries the cache actually runs:

The freshness check. This wants (location, recorded_at DESC).
Daily averages. The same composite index helps SQLite find the relevant rows; it may still group those matched rows afterward because date(recorded_at) is an expression.
Other columns (conditions, humidity, wind_speed) are returned but not filtered or sorted on. No index needed.

SQL: freshness check

SELECT temperature, conditions, recorded_at
FROM weather_readings
WHERE location = ?
  AND recorded_at > datetime('now', '-1 hour')
ORDER BY recorded_at DESC
LIMIT 1;

SQL: daily averages

SELECT date(recorded_at), AVG(temperature)
FROM weather_readings
WHERE location = ?
  AND recorded_at >= date('now', '-7 days')
GROUP BY date(recorded_at);

One index, two queries served. That's the right ratio. Resist the temptation to "index just in case"; every column you index costs writes and storage forever for queries you might never run.

Check query plans before guessing

SQLite has a built-in command that tells you whether a query is using an index or doing a full table scan. Prefix any SELECT with EXPLAIN QUERY PLAN:

explain_demo.py

import sqlite3
from contextlib import closing

with closing(sqlite3.connect("weather.db")) as conn:
    cursor = conn.execute("""
        EXPLAIN QUERY PLAN
        SELECT temperature FROM weather_readings
        WHERE location = ? ORDER BY recorded_at DESC LIMIT 1
    """, ("Dublin",))
    for row in cursor:
        print(row)

Look at the detail text in the output. A line containing SEARCH weather_readings USING INDEX idx_weather_location_recorded_at means the query is using the index. If you see SCAN instead, the query is reading the whole table; either the column isn't indexed, or the index isn't usable for this query shape (composite indexes only help if the query filters on a leading prefix of the indexed columns).

Make bulk writes one transaction

Each commit() is expensive: SQLite has to flush the transaction to disk and fsync, costing milliseconds per call. For one-off inserts (the cache fetching one reading per call) that's invisible. For batch loads (importing a CSV, backfilling a year of historical data), committing once per row turns a 5-second job into a 5-minute one. Wrap the loop in one transaction so SQLite commits once at the end.

batch_insert.py

import sqlite3
from contextlib import closing

# Use a separate demo database so this benchmark does not add fake rows
# to the weather.db file used by the rest of the chapter.
DB_PATH = "weather_perf_demo.db"

# 10,000 rows to insert.
rows = [(f"city_{i}", 15.0 + (i % 20), "test") for i in range(10_000)]

with closing(sqlite3.connect(DB_PATH)) as conn:
    with conn:
        conn.execute("DROP TABLE IF EXISTS weather_readings")
        conn.execute(
            """
            CREATE TABLE IF NOT EXISTS weather_readings (
                id           INTEGER PRIMARY KEY,
                location     TEXT NOT NULL,
                temperature  REAL,
                conditions   TEXT,
                recorded_at  DATETIME DEFAULT CURRENT_TIMESTAMP
            )
            """
        )

# SLOW: one transaction per row. About a minute.
with closing(sqlite3.connect(DB_PATH)) as conn:
    for row in rows:
        conn.execute(
            "INSERT INTO weather_readings (location, temperature, conditions) VALUES (?, ?, ?)",
            row,
        )
        conn.commit()  # commits ten thousand times

# Clear the demo table before trying the fast version.
with closing(sqlite3.connect(DB_PATH)) as conn:
    with conn:
        conn.execute("DELETE FROM weather_readings")

# FAST: one transaction, one commit. Sub-second.
with closing(sqlite3.connect(DB_PATH)) as conn:
    with conn:
        conn.executemany(
            "INSERT INTO weather_readings (location, temperature, conditions) VALUES (?, ?, ?)",
            rows,
        )
        # The inner `with conn:` block commits once when it exits cleanly.

The fast version is two orders of magnitude quicker. executemany() is the canonical batch-insert tool; if you need finer control (validate each row in Python before inserting, say), wrap your loop in one transaction:

batch_insert.py (continued)

with closing(sqlite3.connect(DB_PATH)) as conn:
    with conn:
        for row in rows:
            if row[1] < -50 or row[1] > 100:
                continue  # validate per row
            conn.execute(
                "INSERT INTO weather_readings (location, temperature, conditions) VALUES (?, ?, ?)",
                row,
            )

The outer closing(...) block closes the connection. The inner with conn: block manages the transaction: one commit if the loop finishes cleanly, rollback if an exception escapes. You get the same transaction benefit as executemany(), with room for per-row Python logic.

Protect production data

Turn on foreign keys

SQLite has FOREIGN KEY constraints in the schema syntax but doesn't enforce them by default, for backwards compatibility. Turn them on at the start of every connection:

connect_with_fks.py

import sqlite3
from contextlib import closing

with closing(sqlite3.connect("weather.db")) as conn:
    conn.execute("PRAGMA foreign_keys = ON")

    # ... use the connection ...

Without this, a foreign-key clause in the schema is decorative; the database accepts inserts that point at non-existent rows. The PRAGMA only applies for the lifetime of the connection, so set it on every connect() call.

Take safe backups

SQLite databases are usually described as "just a file", but backup habits need one extra bit of care. If the database is closed, copying the database file is fine. If the application might be running, take a database-level snapshot instead. In WAL mode, recent committed data can live in a companion -wal file, so copying only weather.db can miss data.

backup_database.py

import sqlite3
from contextlib import closing

with closing(sqlite3.connect("weather.db")) as source:
    with closing(sqlite3.connect("weather-backup.db")) as backup:
        source.backup(backup)

The SQLite command-line tool has the same idea: sqlite3 weather.db ".backup weather-backup.db". Both approaches ask SQLite to produce a consistent snapshot, even if another process is using the database.

Vacuum after large deletes

Deleting rows in SQLite leaves their disk space allocated to the database file (the file doesn't shrink as data goes out). For databases that see large bulk deletes, run VACUUM periodically to compact the file:

vacuum.py

import sqlite3
from contextlib import closing

with closing(sqlite3.connect("weather.db")) as conn:
    # Delete old data first.
    with conn:
        conn.execute(
            "DELETE FROM weather_readings WHERE recorded_at < datetime('now', '-90 days')"
        )

# VACUUM cannot run inside a transaction, so use a new closed-after-use connection.
with closing(sqlite3.connect("weather.db")) as conn:
    conn.execute("VACUUM")

For most personal projects you'll never need to run VACUUM; the file just doesn't grow that much. For long-running services that periodically purge old rows, run it monthly or quarterly.

Know when SQLite is no longer enough

SQLite is the right tool for this chapter and probably the right tool for the Spotify project in Chapter 16. The signals that say it's time to move to a server database (PostgreSQL is the default choice) come down to concurrency:

Many simultaneous writers. SQLite serialises writes through a single file lock. One writer at a time is fine for a single-user app or a low-traffic background process; it's wrong for a web service handling many concurrent users posting data. PostgreSQL is built for many clients and concurrent writers, usually with a connection pool in production.
Multiple machines need access. SQLite is a file; the file lives on one disk. If your application scales to multiple servers, they all need to talk to the same database, and that means a network database protocol, not a file. PostgreSQL was built for this.
Specialised data types or extensions. Full-text search at scale, geospatial queries, JSON with rich operators, time-series partitioning, replication, and so on. SQLite has useful features in some of these areas, but PostgreSQL has the richer ecosystem and production tooling when those needs become central to the application.

What doesn't justify the switch by itself: data size. SQLite handles multi-gigabyte databases fine; the official limit is 281 TB. Performance per query is roughly comparable on the workloads where both fit. The graduation point is concurrency and infrastructure shape, not size.

The good news is that the SQL you've learned in this chapter transfers directly. PostgreSQL accepts almost identical SQL: CREATE TABLE, INSERT, SELECT, JOIN, indexes, transactions, EXPLAIN. The differences come down to data types (PostgreSQL has stricter type enforcement and richer types like TIMESTAMP WITH TIME ZONE), the connection driver (psycopg instead of sqlite3, with parameter style %s instead of ?), and the operational shape (you have a server to administer). The mental model stays the same.

A production habit checklist

Add indexes for repeated query shapes, not guesses.
Use EXPLAIN QUERY PLAN when a query feels slow.
Batch large writes inside one transaction.
Enable foreign keys on every connection that relies on them.
Back up through SQLite when the application might be running.
Move to PostgreSQL for concurrency and infrastructure shape, not just database size.

Next, the chapter review: a recap of the chapter's ideas, a short quiz, and a forward reference into Chapter 16 where the cache patterns from this chapter become the persistence layer of the Spotify Music Time Machine.