3. Unit testing pure functions

Pure functions are the cheapest tests to write and the most reliable to run: no database, no network, no Flask app. The Music Time Machine has three good candidates: _format_month from the dashboard, _build_playlist_query from the Playlist Manager, and normalise_track from Chapter 16's snapshot pipeline. Each one teaches a different pytest pattern: simple assertion, tuple-return inspection, and payload-shape testing.

What makes a good unit test

Unit tests verify individual functions in isolation, without external dependencies. They're the foundation of your test suite because they're fast, focused, and easy to maintain.

  • Fast. Runs in milliseconds. No network calls, no database queries, no file I/O.
  • Isolated. Tests one function and does not depend on other tests.
  • Repeatable. Produces identical results every time.
  • Clear. The test name and assertions make the expected behaviour obvious.

The best candidates are pure functions: take inputs, do a calculation, return outputs, no side effects. _format_month formats a SQL date string for display. _build_playlist_query maps a whitelisted playlist source to SQL. normalise_track turns a Spotify track object into stable database columns plus raw_json. None touches the live Spotify API.

The Arrange-Act-Assert pattern

Professional tests follow the Arrange-Act-Assert pattern. Arrange sets up inputs, Act calls the function, Assert verifies the output. Reading the test top-to-bottom tells you exactly what failed.

tests/test_helpers.py
from app import _format_month


def test_format_month_returns_short_month_and_year():
    """_format_month turns SQLite YYYY-MM output into 'MMM YYYY'."""
    raw = "2026-03"

    formatted = _format_month(raw)

    assert formatted == "Mar 2026"

Test target: _build_playlist_query

_build_playlist_query sits inside app.py and turns a whitelisted source into a parameterised SQL query plus its parameter list. It is pure: takes a string, returns a (sql, params) tuple, and does not open a database connection. The route validates the submitted source before calling it; the helper owns the query shape.

app.py (excerpt)
def _build_playlist_query(source):
    """Return a parameterised query for a whitelisted playlist source."""
    if source == 'forgotten_gems':
        return """
            SELECT DISTINCT t.track_id, t.name, t.artist_name,
                   MAX(old.snapshot_date) AS last_seen
            FROM tracks t
            JOIN snapshots old ON old.track_id = t.track_id
            WHERE old.time_range = 'short_term'
              AND old.snapshot_date BETWEEN date('now', '-365 days')
                                        AND date('now', '-90 days')
              AND NOT EXISTS (
                  SELECT 1 FROM snapshots recent
                  WHERE recent.track_id = t.track_id
                    AND recent.time_range = 'short_term'
                    AND recent.snapshot_date >= date('now', '-30 days')
              )
            GROUP BY t.track_id
            ORDER BY last_seen DESC
            LIMIT 30
        """, []

    if source == 'recent_discoveries':
        return """
            WITH latest AS (
                SELECT MAX(snapshot_date) AS snapshot_date
                FROM snapshots
                WHERE time_range = 'short_term'
            )
            SELECT t.track_id, t.name, t.artist_name,
                   s.snapshot_date AS last_seen
            FROM snapshots s
            JOIN latest ON latest.snapshot_date = s.snapshot_date
            JOIN tracks t ON t.track_id = s.track_id
            WHERE s.time_range = 'short_term'
              AND NOT EXISTS (
                  SELECT 1 FROM snapshots earlier
                  WHERE earlier.track_id = s.track_id
                    AND earlier.time_range = 'short_term'
                    AND earlier.snapshot_date < s.snapshot_date
              )
            ORDER BY s.rank
            LIMIT 30
        """, []

    if source == 'current_rotation':
        return """
            WITH latest AS (
                SELECT MAX(snapshot_date) AS snapshot_date
                FROM snapshots
                WHERE time_range = 'short_term'
            )
            SELECT t.track_id, t.name, t.artist_name,
                   s.snapshot_date AS last_seen
            FROM snapshots s
            JOIN latest ON latest.snapshot_date = s.snapshot_date
            JOIN tracks t ON t.track_id = s.track_id
            WHERE s.time_range = 'short_term'
            ORDER BY s.rank
            LIMIT 30
        """, []

    raise ValueError(f'Unknown playlist source: {source}')

The tests pin three contracts: every playlist source caps results, Forgotten Gems uses the date-window exclusion, and unknown sources fail closed.

tests/test_helpers.py (continued)
import pytest

from app import _build_playlist_query


def test_build_playlist_query_forgotten_gems_uses_date_windows():
    sql, params = _build_playlist_query('forgotten_gems')

    assert "date('now', '-365 days')" in sql
    assert "date('now', '-90 days')" in sql
    assert "date('now', '-30 days')" in sql
    assert "NOT EXISTS" in sql
    assert "LIMIT 30" in sql
    assert params == []


def test_build_playlist_query_current_rotation_uses_latest_snapshot():
    sql, params = _build_playlist_query('current_rotation')

    assert "WITH latest AS" in sql
    assert "MAX(snapshot_date)" in sql
    assert "ORDER BY s.rank" in sql
    assert "LIMIT 30" in sql
    assert params == []


def test_build_playlist_query_unknown_source_raises_value_error():
    with pytest.raises(ValueError):
        _build_playlist_query('banana')

Test target: normalise_track

Chapter 16 stores the common track fields as normal columns and preserves the complete Spotify track object in raw_json. That transformation is small, but important: if it drops the raw payload, Chapter 25's JSONB lesson loses its bridge.

tests/test_helpers.py (continued)
import json

from monthly_snapshots import normalise_track


def test_normalise_track_keeps_columns_and_raw_json():
    track = {
        'id': 'abc123',
        'name': 'A Song',
        'artists': [{'name': 'An Artist'}],
        'album': {
            'name': 'An Album',
            'images': [{'url': 'https://i.scdn.co/image/abc123'}],
        },
        'duration_ms': 210000,
        'external_urls': {'spotify': 'https://open.spotify.com/track/abc123'},
        'disc_number': 1,
    }

    row = normalise_track(track)

    assert row['track_id'] == 'abc123'
    assert row['name'] == 'A Song'
    assert row['artist_name'] == 'An Artist'
    assert row['album_name'] == 'An Album'
    assert row['duration_ms'] == 210000
    assert json.loads(row['raw_json'])['disc_number'] == 1

Why edge cases matter

Tests should reveal bugs before users do. Edge cases are where bugs hide: empty inputs, missing keys, duplicates, unknown sources, and malformed provider payloads. _build_playlist_query's unknown-source test pins the security boundary. normalise_track's raw JSON assertion pins the data-retention boundary.

The next page covers the boundary that pure functions don't touch: external dependencies. As soon as you need to call Spotify or query a database, that changes, and that's where mocking earns its keep.