Indexes

As weather_readings grows, a quiet problem appears: SQLite has more rows to inspect. With ten rows, checking each one is invisible. With a million rows, a query like WHERE location = ? can become part of your app's waiting time. An index gives SQLite a shortcut.

A weather_readings table illustrated growing taller. Three rows of weather data at the top (Dublin, London, Dublin), a row of ellipsis dots indicating continuation, and a final row at row 500000. Vertical arrows on the right indicate earlier rows at the top and later rows at the bottom.

Databases grow vertically as tables collect more rows. The bigger the table, the more painful an unindexed lookup becomes. To see the vertical row growth in practice, here is weather_readings after the app has been collecting readings for a while:

**`weather_readings`**
id	location	temperature	conditions	recorded_at
1	Dublin	12.5	cloudy	2026-05-05 14:30:22
2	London	15.2	rain	2026-05-05 14:35:10
3	Dublin	13.1	partly cloudy	2026-05-06 09:15:44
...	...	...	...	...
500000	Dublin	11.9	windy	2026-12-18 08:10:05

The question that matters: how does SQLite find the Dublin rows quickly without checking half a million rows one by one? That's the work of an index. Everything on this page runs against the weather.db database you've been building throughout the chapter.

Why lookups slow down

Every WHERE location = ? against an unindexed column starts as a scan. SQLite checks row 1, then row 2, then row 3, and keeps going until it has inspected the table. With ten rows, that is invisible. With a million rows, the wait becomes part of your application's behaviour.

A side-by-side comparison showing a database search without an index scanning rows one by one, and a search with an index jumping directly to the matching Dublin row. — Indexes turn slow row-by-row searches into direct lookups.

When you create an index in SQLite, it builds and maintains a second hidden structure alongside the table, stored inside the database file itself. This structure acts as a sorted lookup map for one or more columns, allowing SQLite to find matching rows without scanning every row in the table.

The rule of thumb: create an index for columns you filter on (WHERE) or sort by (ORDER BY) repeatedly. Indexes can speed up reads dramatically; the trade-off is extra disk space and slightly slower writes, because SQLite must keep the index updated as rows change.

CREATE INDEX is the SQL that adds one of these lookup structures to an existing table.

Your Python code for the SELECT query doesn't change either way. You still write the same WHERE location = ? query; SQLite automatically decides whether to use an index when it plans the query.

Adding an index

The fastest way to see what an index changes is to compare SQLite's query plan before and after the index exists.

Check the plan with `EXPLAIN QUERY PLAN`

The SQL statement EXPLAIN QUERY PLAN asks SQLite to show how it intends to execute a query before running it. It is both a debugging tool and a learning tool. Instead of returning your actual data rows, SQLite returns a description of the strategy it plans to use, including:

whether it scans the table row by row
whether it uses an index
which index it picks if multiple are eligible

Run these individual SQL statements against weather.db so you can see the plan change for yourself.

Step 1. Run EXPLAIN QUERY PLAN while no index exists on location.

SQL: query plan

EXPLAIN QUERY PLAN
SELECT temperature, conditions, recorded_at
FROM weather_readings
WHERE location = 'Dublin';

The output shows that SQLite has to walk the table top to bottom. SCAN means row-by-row read:

Terminal (before the index)

QUERY PLAN
`--SCAN weather_readings

Step 2. Create the index. Your application only needs to do this once as setup; it does not recreate the index before every query.

SQL: index on location

CREATE INDEX IF NOT EXISTS idx_weather_location
    ON weather_readings(location);

Step 3. Run the same EXPLAIN QUERY PLAN from Step 1 again:

Terminal (after the index)

QUERY PLAN
`--SEARCH weather_readings USING INDEX idx_weather_location (location=?)

SEARCH ... USING INDEX is the shortcut: SQLite looks the value up in the sorted index structure instead of inspecting every row. The exact wording varies between SQLite versions, but the distinction is the important part: SCAN means "walk the table"; SEARCH ... USING INDEX means "use the shortcut."

If you run the same statement through Python with conn.execute(), SQLite returns the plan as tuple rows instead of the tree-style output shown by the command-line tool. The key words to look for are still SCAN and SEARCH.

The trade-off is that the index takes up disk space and adds a little work to each INSERT, because SQLite has to update both the table and the index. That cost is usually tiny for application databases, but it is the reason you do not index every column by reflex. The final section returns to the rule of thumb.

Indexing the cache query

The single-column index above is a teaching step. The cache will use a better composite index because its real freshness query uses more than one column. It looks up the latest reading for one location. A single-column index on location helps SQLite find Dublin's rows, but SQLite may still need an extra sorting step on recorded_at (ORDER BY) to find the newest one.

The freshness query

SELECT temperature, conditions, recorded_at
FROM weather_readings
WHERE location = ?
ORDER BY recorded_at DESC
LIMIT 1;

A composite index stores entries in the order this query wants: grouped by location, then newest-first within each location.

A composite index for the freshness query

CREATE INDEX IF NOT EXISTS idx_weather_location_recorded_at
    ON weather_readings(location, recorded_at DESC);

Read (location, recorded_at DESC) from left to right. SQLite first groups rows by location, then sorts each by recorded_at in descending order.

Conceptually, the index entries are ordered like this:

Dublin   2026-05-06 09:15:44
Dublin   2026-05-05 14:30:22
London   2026-05-05 14:35:10
Paris    2026-05-04 11:20:01

That means SQLite can jump straight to Dublin's section of the index and immediately encounter the newest reading first, which matches: WHERE location = ? ORDER BY recorded_at DESC LIMIT 1. The final cache class uses this shape for its freshness check, so the index above is the key to keeping that lookup fast as the table grows.

This composite index also handles location-only lookups because location is the leading column. In a real application you would usually keep the composite index and skip the separate idx_weather_location index unless a measured query plan showed both were useful.

When an index earns its keep

The rule-of-thumb from earlier plays out concretely for weather_readings. The composite index on (location, recorded_at DESC) earns its keep because the application's hot query is "the latest row for this location". It supports the final cache class's freshness check and the daily-averages query, both of which filter on location and care about recorded_at. Other columns (temperature, conditions, anything you add later) may show up in occasional reports but are not the main lookup path, so they don't earn an index.

If a query starts feeling slow as the table grows, the first thing to check is whether its WHERE and ORDER BY columns match an index. If they don't, run EXPLAIN QUERY PLAN from the demo above to see whether SQLite is doing a SCAN when it could be using a SEARCH instead.

Indexes solve the speed problem that comes with vertical row growth. They don't touch the repetition problem: the same location text living on every row, with no canonical record of what that location actually is. That's the other half of the toolkit, and it's the subject of the next page: splitting related data into separate tables and stitching them back together with foreign keys and JOINs.