Schema design for applications

The chapter so far has shown you the SQL and the Python. This page steps back and asks the design questions behind that code: what should each column store, when should data move into another table, and how do you change the schema after real rows already exist?

Choose columns by how you will query them

SQLite will accept almost any value in almost any column. That flexibility is convenient, but the declared type still shapes what queries are practical. A column you will filter, sort, or aggregate by needs the type that supports the operation; the choice you make at CREATE TABLE time is hard to undo later without a migration.

Before choosing a type, ask what the application will do with the value:

Filter by it? Store it consistently so WHERE can compare it reliably.
Sort by it? Pick a representation where the stored order matches the order you mean.
Aggregate it? Use a numeric type if you'll compute averages, totals, minimums, or maximums.
Display only? TEXT is often enough if the value is never used for calculation or lookup.

The four types from the schema basics page (INTEGER, REAL, TEXT, DATETIME) cover ~95% of API data. The cases worth knowing:

Counts and ids: INTEGER. If a value is genuinely a whole number and you'll do arithmetic on it, INTEGER is the right choice. SQLite stores small integers in 1 byte and grows up to 8 bytes only as needed; storage is cheap.
Measurements: REAL; money: smallest-unit INTEGER. Temperatures, weights, and similar measurements are fine as REAL. Money is different: store it in the smallest unit (1250 cents, as INTEGER) so floating-point rounding never bites you when values are summed or compared.
Booleans: INTEGER 0 or 1. SQLite has no BOOLEAN type. Use INTEGER with the convention 0 = false, 1 = true. Add a CHECK constraint if you want the database to enforce the range.
Timestamps: use SQLite-friendly UTC text. SQLite has no dedicated datetime storage class; DATETIME is a useful declaration for human readers, while SQLite stores the value according to its normal affinity rules.
The cleanest path is to declare the column with DEFAULT CURRENT_TIMESTAMP and let SQLite fill it. The stored value comes out as "2026-05-06 14:22:31": space separator, no timezone suffix, UTC. That format sorts correctly as text, which means freshness checks can compare timestamp strings safely. (See the caution below for what happens when you don't follow this rule.)
JSON blobs: TEXT. If you're storing raw API responses for later, save the whole JSON as a TEXT column. SQLite has a JSON1 extension that lets you query into JSON columns (json_extract(raw_response, '$.main.temp')), useful for "the schema changed last week, the old rows have a different shape" cases.

The isoformat() lexical-comparison trap

Python's datetime.isoformat() produces 2026-05-06T14:22:31+00:00. SQLite's datetime('now') produces 2026-05-06 14:22:31. Same instant, different strings, and SQLite compares timestamp strings lexically.

The formats diverge at character position 10, where SQLite uses a space and Python uses T:

SQLite uses a space: ASCII 32.
Python uses T: ASCII 84.

Because T sorts after a space, a Python-formatted timestamp can look "newer" than a SQLite-formatted threshold on the same UTC day. A 5-hour-old row can sail through a 60-minute freshness check.

Do not mix isoformat() strings with SQLite's datetime('now') format in comparisons. Use DEFAULT CURRENT_TIMESTAMP on the column, or if you must build the value in Python: format it with datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S").

Let the database protect important rules

Beyond type, columns can carry rules. Use them; the closer the rule lives to the data, the harder it is for application code to violate it accidentally. The schema below is illustrative: a richer weather_readings than the one the chapter's WeatherCache actually uses, designed to show what every common constraint shape looks like in one place.

You do not need all of these constraints on every table. The point is to recognise the kinds of rules the database can enforce for you.

constraints_demo.sql (illustrative)

CREATE TABLE IF NOT EXISTS weather_readings_constraints_demo (
    id           INTEGER PRIMARY KEY,
    location     TEXT NOT NULL,
    temperature  REAL CHECK (temperature BETWEEN -100 AND 100),
    conditions   TEXT,
    humidity     INTEGER CHECK (humidity BETWEEN 0 AND 100),
    is_forecast  INTEGER NOT NULL DEFAULT 0 CHECK (is_forecast IN (0, 1)),
    source       TEXT NOT NULL DEFAULT 'api' CHECK (source IN ('api', 'cache')),
    recorded_at  DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP
);

Six constraints are doing real work in that schema:

NOT NULL on location and recorded_at: a row missing either is rejected.
CHECK (temperature BETWEEN -100 AND 100): a sanity check that catches obvious errors; a temperature of 500 means something's wrong upstream.
CHECK (humidity BETWEEN 0 AND 100): the same idea for percentages.
DEFAULT 0 on is_forecast: callers who don't specify a value get false, not NULL.
CHECK (is_forecast IN (0, 1)): the boolean discipline made explicit.
CHECK (source IN ('api', 'cache')): only the two known source labels are allowed.

Constraints come at a small write-time cost because the database checks each rule before accepting the row. For the kinds of constraints shown here, they do not make ordinary reads more complicated. For any value that has a knowable invariant, encoding the rule in the schema means you do not have to remember to check it in every code path that writes the table.

Be careful with uniqueness constraints on timestamps. A rule like UNIQUE (location, recorded_at) sounds like a neat duplicate guard, but CURRENT_TIMESTAMP records only whole seconds. Two legitimate writes for the same location inside one second would collide. Add that constraint only if "one reading per location per second" is truly the policy you want.

Split tables when one thing becomes two

Foreign keys and JOINs were introduced earlier with a two-table example. The principle behind it has a name, normalisation, and the question worth asking is "when does this pay off?"

Four signals that a column wants to move into its own table:

The same value repeats across many rows. If "Dublin, Ireland" appears 1000 times in weather_readings, that's redundant storage and a renaming hazard. Move it to a locations table; reference by id.
The thing has its own attributes you'd want to query. A city has a country, a timezone, a country code, an OpenWeatherMap city id. Keeping all of those on every weather row means updating thousands of rows when a city's metadata changes. A locations table holds them once.
The relationship is many-to-one. A track has many plays; a user has many sessions; a city has many weather readings. The "many" side stores a foreign key, while the "one" side lives in its own table.
The relationship is many-to-many. A track can have many artists, and an artist can appear on many tracks. That needs a junction table such as track_artists(track_id, artist_id); a single foreign key on either main table would lose part of the relationship.

The cost of normalisation is JOINs at query time. SQLite handles them well, but each JOIN adds complexity to the SQL and a small runtime cost. The trade-off is usually worth it when the separate thing has its own attributes or is reused by other tables.

The cache in this chapter is the exception. Each location has no other queryable attributes that the code needs: no country, timezone, API id, or related table. Because of that, the location string can live directly on each weather_readings row.

The Spotify project in Chapter 16 uses a more selective design. Track metadata lives once, listening snapshots reference those tracks over time, and the full Spotify track payload is cached as JSON for fields you didn't normalise into columns. Artist and album names stay with the track metadata rather than getting their own tables.

Change schemas without losing rows

A migration is any change to the schema of a database that already has data in it: adding a column for atmospheric pressure to existing weather rows, renaming a field, changing a column type. The question is how to apply the change without losing or corrupting the existing rows. SQLite's ALTER TABLE supports a few kinds of change directly; for anything else, the technique is "create a new table, copy data across, swap the old one out."

The easy case: add a column

Adding a column with a default is a one-line change that works on any existing data:

migration_add_column.sql

ALTER TABLE weather_readings
    ADD COLUMN pressure REAL DEFAULT NULL;

Every existing row gets NULL in the new column; new inserts can populate it. SQLite raises an error if the column already exists, so either check PRAGMA table_info(weather_readings) before adding it or track completed migrations in a schema_migrations table.

Always test a migration against a copy of the database before running it on the real file. A migration is code that edits your data, not just your schema.

The harder case: rebuild the table

Some schema changes are easy to express with ALTER TABLE. Adding a column is one example. Renaming a column is also supported in modern SQLite, and dropping a column is supported in newer versions.

Changing the meaning or type of an existing column is different. In that case, the safest pattern is to build a new table with the schema you want, copy the existing rows across, then replace the old table with the new one.

That sounds heavy, but the steps are predictable:

migration_change_type.sql

BEGIN;

-- 1. New table with the desired schema.
CREATE TABLE weather_readings_v2 (
    id           INTEGER PRIMARY KEY,
    location     TEXT NOT NULL,
    temperature  REAL,
    conditions   TEXT,
    humidity     INTEGER,
    wind_speed   REAL,
    pressure     INTEGER,                -- changed from REAL to INTEGER
    recorded_at  DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP
);

-- 2. Copy existing data, casting as needed.
INSERT INTO weather_readings_v2
    (id, location, temperature, conditions, humidity, wind_speed, pressure, recorded_at)
SELECT
    id, location, temperature, conditions, humidity, wind_speed,
    CAST(pressure AS INTEGER), recorded_at
FROM weather_readings;

-- 3. Drop the old table and rename the new one.
DROP TABLE weather_readings;
ALTER TABLE weather_readings_v2 RENAME TO weather_readings;

-- 4. Re-create any indexes the old table had.
CREATE INDEX idx_weather_location_recorded_at
    ON weather_readings(location, recorded_at DESC);

COMMIT;

The pattern is the same each time: create the replacement table, copy the old data into it, rename it into place, then rebuild anything the old table depended on. The important part is that the whole migration runs inside one explicit transaction. Either every step succeeds, or SQLite rolls the database back to its previous state.

If other tables reference this table with foreign keys, rebuild migrations need extra care. In small projects, the safest habit is to test the migration on a copy of the database and verify that foreign key checks still pass before touching the real file.

Do not assume executescript() makes a multi-statement migration atomic by itself. Put BEGIN and COMMIT in the SQL file, as shown above. Then Python can apply the migration as one deliberate unit:

apply_migration.py

import sqlite3

with sqlite3.connect("weather.db") as conn:
    with open("migration_change_type.sql") as f:
        # The SQL script contains BEGIN and COMMIT, so the migration
        # is atomic even though executescript() runs multiple statements.
        conn.executescript(f.read())
print("Migration applied.")

Track migrations as part of the database

Once you have more than one migration, you need to know which have already run. The standard pattern is a schema_migrations table:

migrations_table.sql

CREATE TABLE IF NOT EXISTS schema_migrations (
    version    INTEGER PRIMARY KEY,
    applied_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
    note       TEXT
);

Each migration runner should check whether the migration version is already in the table before applying the script. If that version is present, the runner skips the script. After a migration succeeds, the runner inserts a row into schema_migrations inside the same transaction. That makes migrations idempotent (running the same migration twice is a no-op the second time) and discoverable (a quick SELECT * FROM schema_migrations tells you which migrations have run).

Real frameworks (Alembic for SQLAlchemy, the Django migrations system) automate this whole loop. For SQLite-only projects of the size you'll build in this book, hand-rolled scripts are fine; the table-and-transaction discipline is the load-bearing part, not the framework.

A schema design checklist

When you are designing a new table, ask these questions before writing CREATE TABLE:

Will I filter, sort, or aggregate this value? Pick a type and format that makes those operations reliable.
Does this value repeat with its own attributes? If yes, it may deserve its own table and a foreign key.
Should the database reject invalid rows? Add NOT NULL, CHECK, UNIQUE, or foreign key constraints where the rule is real.
Will this schema need to change later? Keep migrations small, test them on a copy, and track which ones have run.

Next, you will look at the performance side of SQLite: which indexes help, which indexes waste space, how to read a query plan, why batched transactions matter for write-heavy code, and when it is time to move from SQLite to PostgreSQL.