3. Spotify's data model
Before we design the database, we need to understand the shape of the data Spotify gives us. The Music Time Machine does not need every field in every response, but it does need to know which values identify a track, how Spotify describes listening history, and which parts of the response are worth keeping for later.
This section turns Spotify's JSON into design decisions. You will see why the app stores compact track IDs, when it builds Spotify URIs, how time ranges support features like Forgotten Gems, and why the full track object is still worth caching even when only a few fields become database columns.
Start with the track object
Here is a shortened version of a Spotify track object. The real response can contain more fields, but this example shows the parts that matter for the Music Time Machine schema.
{
"id": "63OQupATfueTdZMWTxW03A",
"uri": "spotify:track:63OQupATfueTdZMWTxW03A",
"name": "Karma Police",
"artists": [
{
"id": "4Z8W4fKeB5YxbusRsdQVPb",
"name": "Radiohead"
}
],
"album": {
"id": "6400dnyeDyD2mIFHfkwHXN",
"name": "OK Computer",
"release_date": "1997-06-16",
"images": [
{
"url": "https://i.scdn.co/image/ab67616d0000b273c8b444df094279e70d0ed856",
"height": 640,
"width": 640
}
]
},
"duration_ms": 263066,
"explicit": false,
"external_urls": {
"spotify": "https://open.spotify.com/track/63OQupATfueTdZMWTxW03A"
}
}
This object carries three kinds of value, and each kind leads to a different schema decision. Keep this split in mind, because the rest of the section works through the three kinds in order.
- Identity. The
id,uri, andexternal_urls.spotifyvalues all point to the same track, but they do different jobs. One of them becomes the database key. - Display. Values such as
name,artists,album.name,album.images, andduration_msare what the dashboard shows the user. These become ordinary columns. - Everything else. Values such as
explicit,album.release_date, artist IDs, and album IDs may be useful later, but they do not each need a column today. The app keeps them, just not as columns.
Identity: Spotify's identifier system
Identity is the first decision. The track object hands you three identifier-shaped values, and the app needs one clear rule for each.
- Use
idfor storage. This is the compact Spotify track ID, for example63OQupATfueTdZMWTxW03A. It is short, stable, and the natural primary key. - Use
urifor Spotify playlist methods. This wraps the same ID in Spotify's resource format, for examplespotify:track:63OQupATfueTdZMWTxW03A. Thespotify:track:prefix tells Spotify the ID belongs to a track. - Use
external_urls.spotifyfor display. This is the public web link that opens the track in a browser. It is for the user, not for the database -- so of the three identifier-shaped values, this one is really a display value.
The rule is simple: store the compact id, build the uri when a playlist method needs it, and use the web URL only as a link for the user.
Because the Spotify URI is just the track ID with spotify:track: added to the front, your database does not need to store both values. Store the ID once, then build the URI later when your code needs to send tracks to a Spotify playlist endpoint.
The example below shows the difference in code. It gets one track from Spotify, reads the compact ID, reads the Spotify URI, converts between the two formats, and then passes the URI to a playlist method.
# Example: using the right Spotify identifier for the right job
track = sp.current_user_top_tracks(limit=1)['items'][0]
# The compact ID is the value you would store in your database.
track_id = track['id'] # "63OQupATfueTdZMWTxW03A"
# The URI identifies the same track, but in the format Spotify playlist methods expect.
track_uri = track['uri'] # "spotify:track:63OQupATfueTdZMWTxW03A"
# If your database only stores the ID, you can build the URI when you need it.
uri_from_id = f"spotify:track:{track_id}"
# You can also extract the compact ID from a URI by taking the final part.
id_from_uri = track_uri.split(':')[-1]
# When adding tracks to a playlist, pass URIs, not plain IDs.
sp.playlist_add_items(playlist_id, [track_uri])
Storage decision
Store Spotify IDs as your primary keys: they are short, stable, and easy to compare. Build a URI on demand with f"spotify:track:{track_id}" rather than storing it, and save the web URL only when the dashboard needs a link back to Spotify -- never as the value that joins records together.
Display: choosing columns from the track object
The previous section decided the track key. Now you can decide which track fields deserve their own columns. A field earns a column when the app will search it, sort by it, show it often, or use it in a query. For the first version of the Music Time Machine, that means storing a small, practical subset of the Spotify track object.
- Store the track identity. Keep
idastrack_id. This becomes the value other tables refer to. - Store the common display values. Keep the track name, first artist name, album name, duration, album artwork URL, and Spotify web URL.
- Do not design columns for everything. Fields such as
disc_number,track_number,is_local, and many nested IDs do not affect the first version of the app. Leaving them out keeps the database easy to understand.
Database design hint
Artists and albums arrive as nested objects. A fully relational design would split tracks, artists, and albums into separate tables. That removes repetition, but it also adds joins and complexity the first version does not need.
The Music Time Machine stores artist_name and album_name directly on the tracks table instead. A name like "Radiohead" may repeat across rows, but the queries stay readable and the schema stays focused on the project you are building. Section 4 builds the schema in full.
Everything else: keep the full response without overbuilding
Not every Spotify field needs its own database column. Your app only needs columns for the values it will query, filter, display, or reuse often. But the fields you do not turn into columns still belong to the original Spotify response.
If you drop those extra fields completely, you may regret it later. Today you may only need the track name, artist, album, artwork, and Spotify URL. Later, you may want release dates, artist IDs, album IDs, popularity scores, explicit flags, or other nested fields that were already present in the original response.
The solution is to store each track in two forms at the same time. First, store the fields your app uses often as normal database columns. Second, keep the complete Spotify track object as JSON text in a raw_json column.
The normalise_track function does that split. It takes one raw Spotify track object and returns a dictionary with both parts: named values for database columns, plus the full original Spotify response saved under raw_json.
The code below is not the final version of the function, but it shows the basic idea you will use later.
import json
def normalise_track(track):
"""Split a Spotify track object into columns plus cached raw JSON."""
return {
'track_id': track['id'],
'name': track['name'],
'artist_name': track['artists'][0]['name'],
'album_name': track['album']['name'],
'duration_ms': track['duration_ms'],
'album_image_url': track['album']['images'][0]['url'] if track['album']['images'] else None,
'spotify_url': track['external_urls']['spotify'],
'raw_json': json.dumps(track)
}
This design keeps the schema practical without throwing information away. The columns make common queries simple; the raw JSON cache preserves the original response for future features.
SQLite stores raw_json as text for now. Later, when the project moves to PostgreSQL, the same idea becomes more flexible because PostgreSQL can store and query JSON documents with JSONB.
Time ranges: which tracks, and from when
The track object tells you what Spotify knows about one track: its ID, name, artist, album, artwork, duration, and links. The next question is different: which tracks should the Music Time Machine ask Spotify for?
The app is not only interested in the user's current favourites. It needs to compare recent listening with longer-term taste, so it can build features such as monthly snapshots, sustained favourites, and forgotten gems.
Spotify offers this choice through the time_range parameter, which both current_user_top_tracks() and current_user_top_artists() accept. It tells Spotify which listening window to analyse before returning the user's top tracks or artists: what they are playing now, what they have liked over the last few months, or the tracks that have stayed with them for years.
| Time range | Approximate period | Use case |
|---|---|---|
short_term |
Last ~4 weeks | Current obsessions, recent discoveries, monthly snapshots |
medium_term |
Last ~6 months | Seasonal preferences, sustained favourites, trend analysis |
long_term |
Several years | All-time favourites, lifetime patterns, forgotten gems discovery |
Treat these ranges as listening-history windows, not exact calendar guarantees.
Use time_range when you want Spotify to choose the list: short_term for recent favourites, medium_term for tracks that have stayed popular over several months, and long_term for long-running favourites.
Use your own database dates when the app needs exact history. Later, when the Music Time Machine saves results, it will store when those results were collected. That is what lets the app compare one month with another, or show how the user's taste changes over time.
# Fetch top tracks for each time range
short_term = sp.current_user_top_tracks(limit=50, time_range='short_term')
medium_term = sp.current_user_top_tracks(limit=50, time_range='medium_term')
long_term = sp.current_user_top_tracks(limit=50, time_range='long_term')
# Extract track IDs from each range
short_ids = {track['id'] for track in short_term['items']}
medium_ids = {track['id'] for track in medium_term['items']}
long_ids = {track['id'] for track in long_term['items']}
# Find forgotten gems: in long-term but not in short-term
forgotten = long_ids - short_ids
print(f"You loved {len(forgotten)} tracks long-term but haven't heard them recently")
# Find new discoveries: in short-term but not in medium-term
discoveries = short_ids - medium_ids
print(f"You discovered {len(discoveries)} new tracks in the last month")
The database will not store Spotify's time range labels as the app's history. Instead, it will store the date each result was collected. Spotify's ranges help you fetch different lists; your own dates let you build long-term features from those saved results, such as comparing one month with another or showing how the user's taste changes over time.
How this enables the features
The choices in this section now explain how the Music Time Machine can work. Each feature below depends on one of the data decisions you have just made.
- Forgotten Gems. Compare long-term favourites with recent favourites. Tracks that appear in the long-term results but not in the recent results are good candidates for rediscovery.
- Currently Obsessed. Fetch
short_termtop tracks once per month and save them with a timestamp. Over time, those snapshots show how the user's taste changes. - Possible extension: Mood Playlists. Use stored track IDs to select matching tracks from the database, then build Spotify URIs when the playlist API needs them.
- Musical Evolution Analytics. Query snapshots by date to measure discovery counts, artist variety, track turnover, and long-running favourites.
- Future metadata features. Keep the complete Spotify track object in
raw_jsonso later chapters can reuse fields that were not important enough to become columns today.
Everything in this section now becomes a schema decision. Track IDs become primary keys. Track objects become a small set of useful columns plus a raw JSON cache. Spotify's time ranges become the starting point for the app's timestamp-based listening history. In the next section, you will turn those decisions into the database schema for the Music Time Machine.