8. Chapter review

The Music Time Machine is the first project in the book that ships every layer at once: OAuth in front, SQLite in the middle, three user-visible features pulling them together, error handling and tests wrapping the whole thing. This page consolidates the architectural calls you've practised defending and tests them with a six-question quiz designed for the exact follow-up question an interviewer will ask after "tell me about a project."

Use the project. Take a snapshot every month or two. After six months, Forgotten Gems starts surfacing tracks you genuinely forgot. After a year, the analytics show patterns in your taste you'd never have spotted by hand. The longer the database lives, the more value you compound, and the more material you have to talk about in interviews.

Music Time Machine project. OAuth, storage, features, reliability, and tests working as one project.
Chapter 16 recap: OAuth, data model, schema, features, error handling, and tests form one finished command-line project.

Key skills mastered

Seven capabilities lock in across this chapter, each one transferable to any other API + database project you build next:

  • OAuth integration with real-world APIs. You applied Chapter 14's authorisation code flow to Spotify through Spotipy, understood what the library hides (token caching, refresh logic) and which security checks still belong to your application, and read the actual HTTP traffic between client and provider. The flow generalises to any provider that follows the standard.
  • Database schema design for time-series data. You separated entities (tracks) from events (snapshots), justified the denormalisation of artist and album names, cached the original track payload in raw_json, and chose a composite primary key on snapshots that gives you idempotency for free. The same separation pattern fits any "current state plus history" application.
  • Multi-feature application architecture. Features that share infrastructure rather than re-implementing it: Forgotten Gems uses set operations on the snapshot history, Currently Obsessed writes new rows, and Analytics aggregates across the snapshot table. One auth client, one database, three feature surfaces.
  • Raw payload caching. Normalised columns make common queries simple, while raw_json preserves the complete provider response for future analysis. That gives you a practical bridge from SQLite text storage here to PostgreSQL JSONB queries later in the book.
  • Production error handling patterns. The Chapter 9 categoriser, exponential backoff with jitter, and partial-success patterns applied against a real provider. The categoriser hides the provider-specific error shapes; the retry harness wraps the Spotify calls that need resilience; partial success means one malformed track record doesn't abort a whole snapshot.
  • Testing API-dependent code with mocks. A mock-based unittest suite that runs in milliseconds, covers edge cases the live API can't reliably reproduce, and runs in CI without OAuth credentials. The same techniques apply to any external dependency, payment processors, email providers, third-party search APIs.
  • Working with third-party API libraries. Use Spotipy for the heavy lifting, but understand it well enough to debug when it fails. The skill is recognising which problems the library has solved (OAuth, request retry plumbing) and which the library leaves to you (your application's data model, business logic, error categorisation).

Chapter review quiz

Six questions on the architectural calls in this chapter. Answer each in your head before expanding, and try to answer in interview language: not "because the book said so," but "because the alternative would have caused X."

Select a question to reveal the answer:
Why does the schema use tracks plus snapshots instead of one wide history table?

The two-table design prevents duplication and lets each query touch only the data it needs. Tracks are entities that exist independently of when you listened to them. Snapshots are events recording when a track appeared in your top 50 on a given date.

Mixing them into one table duplicates track metadata once per snapshot ("Karma Police" stored twelve times for twelve months) and forces every query to scan the wrong scope. Forgotten Gems and Analytics need stable track metadata joined to dated snapshot rows. The composite primary key on snapshots makes repeated same-day runs safe, while the track row remains the single place to refresh names, URLs, and cached raw payloads.

Why do Forgotten Gems use the 90-365-day-old window with a 30-day exclusion buffer?

The windows balance "you used to play this enough that it counted" against "you haven't heard it lately enough to be surprised by it." 90-365 days catches songs you loved over a sustained period (an indie-rock summer, a focus playlist that lasted a quarter), not brief obsessions you'd remember anyway. The 30-day exclusion blocks anything still in your current rotation. The 30-90-day buffer between the two prevents tracks that just dropped off this month from masquerading as "forgotten."

These bounds are configurable. If your listening volume is much higher you might shorten everything (60-180 days, 14-day buffer); if you listen sporadically you might widen it. The defaults express a particular aesthetic of rediscovery, and the parameters give you a way to tune that aesthetic without changing the algorithm.

Why store the original Spotify track object in raw_json if the schema already has normalised columns?

Normalised columns are for the fields you know you will query constantly: track name, artist name, album name, duration, URL. raw_json is for everything else in the provider response that may become useful later: nested album metadata, external identifiers, images, and fields you did not design reports around yet.

The architectural call is "normalise the stable reporting surface, archive the original payload for future questions." In SQLite, raw_json is just text you can inspect or export. In Chapter 25, the same idea becomes PostgreSQL JSONB, where you can index and query nested provider data without pretending every possible field deserved a first-class column on day one.

Why does retry logic use exponential backoff with jitter rather than fixed delays?

Two failure modes drive the design. Server overload: hammering a struggling service with immediate retries makes its recovery slower; doubling the wait between retries gives it progressively more breathing room. With max_attempts=3, this chapter's helper sleeps roughly 1s and 2s, plus jitter, before declaring the failure non-transient. Synchronised retry waves: if every client uses the same fixed delay, they all retry at the same moment, repeatedly overwhelming the recovering server in lockstep. Adding 0-10% jitter staggers the retries so requests arrive as a steady trickle instead of a synchronised pulse.

AWS, Google Cloud, and every major platform use this combination because it minimises both client wait time and server-side load during recovery. It's also the right default for any client you build against an external API where you don't control the server.

Why do tests mock Spotify's API instead of making real calls?

Three reasons, in order of importance. Determinism: your top tracks change daily; tests need controlled inputs to verify "what happens with empty results?" or "what happens when a track payload is missing a field?" Mocks give you total control over what the API "returns." Isolation: tests should pass or fail based on your code's correctness, not Spotify's uptime, your network connection, or your OAuth token's freshness. Speed: a real call takes 100-500ms; a mock returns instantly. A 50-test suite runs in under a second with mocks instead of half a minute against the live API, so you can run it on every save.

The same logic applies to any external dependency. Mock the boundary where your code calls something you don't control, payment APIs, email providers, third-party search, and write deterministic tests of your code in isolation. Integration tests against the real provider belong in a separate, slower suite that runs less often.

Why does the snapshot feature check for an existing snapshot today before fetching new tracks?

Idempotency. Running the script twice in one day, accidentally double-clicking it, or starting it from two terminals should produce the same effect as running it once. The early-exit check is the UX-and-efficiency win: it prints "you already snapshotted today" and skips the redundant Spotify fetch. The real backstop is the data model: the snapshot insert uses INSERT OR IGNORE against the composite primary key (track_id, snapshot_date, time_range), so even if the check is bypassed, a duplicate is silently ignored rather than written twice or raised as an error.

The semantics also reflect the feature's intent: monthly snapshots capture identity at monthly resolution. Spotify's top-50 doesn't change meaningfully in hours, so multiple same-day snapshots wouldn't capture anything new. If you ever want intraday granularity (snapshots every six hours) the change is mechanical, add a time component to snapshot_date and update the primary key, but it would change the feature's meaning, not just its precision.

Looking forward

The Music Time Machine works as a command-line application. You run it, it does its job, the database grows. To show it off, you've been screenshotting terminal output. That's fine for a personal tool but underwhelming for a portfolio. The next two chapters add a face to the project.

Chapter 17 introduces Flask on its own terms: routing, Jinja2 templates, sessions, and error handling, finishing with a capstone page that reads your accumulated snapshots from music_time_machine.db. It teaches the web foundation first, before the full dashboard arrives in Chapter 18.

Chapter 18 builds the full Music Time Machine dashboard: a home page with a Chart.js timeline, browser-based OAuth, analytics with date filtering, a playlist manager with form handling, and a settings page with database export, applying the Chapter 17 patterns until they're muscle memory.

Chapter 19 takes the mock-based testing you started in Section 7 and grows it into a full pytest suite: integration tests against an in-memory SQLite database, end-to-end tests of full feature workflows, and coverage measurement aimed at 95%. For now, the unittest examples in this chapter are enough to keep regressions out of the core features.

Chapter 20 deploys the whole stack, Flask web server, SQLite database, OAuth configuration, to a public URL. Environment variables, database migrations, HTTPS, and the production-security considerations that turn a localhost demo into a recruiter-shareable link.

The foundation you built here, OAuth, schema, three features, error handling, tests, supports everything that follows. The web interface, additional features, fuller test suite, and deployment are additions, not rewrites. That's the point of layered architecture: each layer earns the right to grow without forcing the others to change.