3. Unit testing pure functions

Pure functions are the cheapest tests to write and the most reliable to run: no database, no network, no Flask app. The Music Time Machine has three good candidates: _format_month from the dashboard, _build_playlist_query from the Playlist Manager, and normalise_track from Chapter 16's snapshot pipeline. Each one teaches a different pytest pattern: simple assertion, tuple-return inspection, and payload-shape testing.

What makes a good unit test

Unit tests verify individual functions in isolation, without external dependencies. They're the foundation of your test suite because they're fast, focused, and easy to maintain.

Fast. Runs in milliseconds. No network calls, no database queries, no file I/O.
Isolated. Tests one function and does not depend on other tests.
Repeatable. Produces identical results every time.
Clear. The test name and assertions make the expected behaviour obvious.

The best candidates are pure functions: take inputs, do a calculation, return outputs, no side effects. _format_month formats a SQL date string for display. _build_playlist_query maps a whitelisted playlist source to SQL. normalise_track turns a Spotify track object into stable database columns plus raw_json. None touches the live Spotify API.

The Arrange-Act-Assert pattern

Professional tests follow the Arrange-Act-Assert pattern. Arrange sets up inputs, Act calls the function, Assert verifies the output. Reading the test top-to-bottom tells you exactly what failed.

tests/test_helpers.py

from app import _format_month


def test_format_month_returns_short_month_and_year():
    """_format_month turns SQLite YYYY-MM output into 'MMM YYYY'."""
    raw = "2026-03"

    formatted = _format_month(raw)

    assert formatted == "Mar 2026"

Test target: _build_playlist_query

_build_playlist_query sits inside app.py and turns a whitelisted source into a parameterised SQL query plus its parameter list. It is pure: takes a string, returns a (sql, params) tuple, and does not open a database connection. The route validates the submitted source before calling it; the helper owns the query shape.

app.py (excerpt)

def _build_playlist_query(source):
    """Return a parameterised query for a whitelisted playlist source."""
    if source == 'forgotten_gems':
        return """
            SELECT DISTINCT t.track_id, t.name, t.artist_name,
                   MAX(old.snapshot_date) AS last_seen
            FROM tracks t
            JOIN snapshots old ON old.track_id = t.track_id
            WHERE old.time_range = 'short_term'
              AND old.snapshot_date BETWEEN date('now', '-365 days')
                                        AND date('now', '-90 days')
              AND NOT EXISTS (
                  SELECT 1 FROM snapshots recent
                  WHERE recent.track_id = t.track_id
                    AND recent.time_range = 'short_term'
                    AND recent.snapshot_date >= date('now', '-30 days')
              )
            GROUP BY t.track_id
            ORDER BY last_seen DESC
            LIMIT 30
        """, []

    if source == 'recent_discoveries':
        return """
            WITH latest AS (
                SELECT MAX(snapshot_date) AS snapshot_date
                FROM snapshots
                WHERE time_range = 'short_term'
            )
            SELECT t.track_id, t.name, t.artist_name,
                   s.snapshot_date AS last_seen
            FROM snapshots s
            JOIN latest ON latest.snapshot_date = s.snapshot_date
            JOIN tracks t ON t.track_id = s.track_id
            WHERE s.time_range = 'short_term'
              AND NOT EXISTS (
                  SELECT 1 FROM snapshots earlier
                  WHERE earlier.track_id = s.track_id
                    AND earlier.time_range = 'short_term'
                    AND earlier.snapshot_date < s.snapshot_date
              )
            ORDER BY s.rank
            LIMIT 30
        """, []

    if source == 'current_rotation':
        return """
            WITH latest AS (
                SELECT MAX(snapshot_date) AS snapshot_date
                FROM snapshots
                WHERE time_range = 'short_term'
            )
            SELECT t.track_id, t.name, t.artist_name,
                   s.snapshot_date AS last_seen
            FROM snapshots s
            JOIN latest ON latest.snapshot_date = s.snapshot_date
            JOIN tracks t ON t.track_id = s.track_id
            WHERE s.time_range = 'short_term'
            ORDER BY s.rank
            LIMIT 30
        """, []

    raise ValueError(f'Unknown playlist source: {source}')

The tests pin three contracts: every playlist source caps results, Forgotten Gems uses the date-window exclusion, and unknown sources fail closed.

tests/test_helpers.py (continued)

import pytest

from app import _build_playlist_query


def test_build_playlist_query_forgotten_gems_uses_date_windows():
    sql, params = _build_playlist_query('forgotten_gems')

    assert "date('now', '-365 days')" in sql
    assert "date('now', '-90 days')" in sql
    assert "date('now', '-30 days')" in sql
    assert "NOT EXISTS" in sql
    assert "LIMIT 30" in sql
    assert params == []


def test_build_playlist_query_current_rotation_uses_latest_snapshot():
    sql, params = _build_playlist_query('current_rotation')

    assert "WITH latest AS" in sql
    assert "MAX(snapshot_date)" in sql
    assert "ORDER BY s.rank" in sql
    assert "LIMIT 30" in sql
    assert params == []


def test_build_playlist_query_unknown_source_raises_value_error():
    with pytest.raises(ValueError):
        _build_playlist_query('banana')

Test target: normalise_track

Chapter 16 stores the common track fields as normal columns and preserves the complete Spotify track object in raw_json. That transformation is small, but important: if it drops the raw payload, Chapter 25's JSONB lesson loses its bridge.

tests/test_helpers.py (continued)

import json

from monthly_snapshots import normalise_track


def test_normalise_track_keeps_columns_and_raw_json():
    track = {
        'id': 'abc123',
        'name': 'A Song',
        'artists': [{'name': 'An Artist'}],
        'album': {
            'name': 'An Album',
            'images': [{'url': 'https://i.scdn.co/image/abc123'}],
        },
        'duration_ms': 210000,
        'external_urls': {'spotify': 'https://open.spotify.com/track/abc123'},
        'disc_number': 1,
    }

    row = normalise_track(track)

    assert row['track_id'] == 'abc123'
    assert row['name'] == 'A Song'
    assert row['artist_name'] == 'An Artist'
    assert row['album_name'] == 'An Album'
    assert row['duration_ms'] == 210000
    assert json.loads(row['raw_json'])['disc_number'] == 1

Why edge cases matter

Tests should reveal bugs before users do. Edge cases are where bugs hide: empty inputs, missing keys, duplicates, unknown sources, and malformed provider payloads. _build_playlist_query's unknown-source test pins the security boundary. normalise_track's raw JSON assertion pins the data-retention boundary.

The next page covers the boundary that pure functions don't touch: external dependencies. As soon as you need to call Spotify or query a database, that changes, and that's where mocking earns its keep.