2. Discovering API response structures
You can't normalize a shape you haven't seen, so before writing extraction code against an unfamiliar API, the first job is to reveal what the response actually contains: where the list of records lives, how deep the nesting goes, which keys show up and which don't. This page builds the diagnostic helper you'll reach for on every new endpoint from here on.
The toolkit this chapter builds uses GitHub's public API for teaching. It's real, queryable without authentication, and the two endpoints we hit have dramatically different shapes -- perfect for showing the patterns. Once you've internalised the moves on GitHub, you'll reuse them on the vendor orders challenge when we build the complete normalizer at the end of the chapter.
Guessing an API's structure is one of the fastest ways to ship brittle code. In Chapter 6, the Random User API wrapped records in a results array, but that's just one convention. GitHub's single-repository endpoint returns the object directly at the root. Many search endpoints wrap results in an items array. Other APIs use data or payload. The professional rule: verify the shape first, write extraction code second.
A systematic exploration approach
Rather than manually inspecting JSON responses and getting lost in nested complexity, professional developers build diagnostic tools that automate the exploration process. Those tools do the tedious work -- traversing structures, counting elements, identifying container patterns, truncating sprawling output into a readable summary -- so you can see the shape at a glance and write the right extraction code the first time.
The helper below combines four techniques: response-type detection for objects and arrays, pattern recognition for common container keys, intelligent truncation so deeply nested responses stay readable, and structured analysis that reveals the key characteristics without dumping every field. Save it as explore_api_structure.py at the project root:
import requests
import json
def explore_api_structure(url, max_depth=2):
"""
Systematically explore an API response structure.
This should be your first step with any new API.
"""
try:
response = requests.get(url, timeout=10)
response.raise_for_status()
data = response.json()
print(f"API Response Analysis for: {url}")
print("=" * 60)
print(f"Response type: {type(data).__name__}")
if isinstance(data, dict):
keys = list(data.keys())
preview = keys[:6] + ['...'] if len(keys) > 6 else keys
print(f"Top-level keys: {preview}")
print(f"Total keys: {len(data)}")
# Look for common data container patterns
common_containers = ['items', 'results', 'data', 'content', 'entries', 'records']
found_containers = [key for key in common_containers if key in data]
if found_containers:
print(f"Possible data containers found: {found_containers}")
# Show structure of first few fields
print("\nFirst few fields (with truncated values):")
for i, (key, value) in enumerate(list(data.items())[:5]):
value_type = type(value).__name__
if isinstance(value, (dict, list)):
size_info = f" (length: {len(value)})" if hasattr(value, '__len__') else ""
print(f" {key}: {value_type}{size_info}")
else:
str_value = str(value)
display_value = str_value[:50] + "..." if len(str_value) > 50 else str_value
print(f" {key}: {display_value}")
elif isinstance(data, list):
print(f"Array with {len(data)} items")
if data:
first_item = data[0]
print(f"First item type: {type(first_item).__name__}")
if isinstance(first_item, dict):
print(f"First item keys: {list(first_item.keys())}")
print("\nSample structure (truncated for readability):")
print(json.dumps(truncate_for_display(data, max_depth), indent=2))
except requests.RequestException as e:
print(f"Error fetching data: {e}")
except json.JSONDecodeError as e:
print(f"Error parsing JSON: {e}")
def truncate_for_display(obj, max_depth=2, current_depth=0):
"""Helper function to truncate nested data for readable display."""
if current_depth >= max_depth:
return "..."
if isinstance(obj, dict):
truncated = {}
for i, (key, value) in enumerate(obj.items()):
if i >= 5:
truncated["..."] = f"({len(obj) - 5} more keys)"
break
truncated[key] = truncate_for_display(value, max_depth, current_depth + 1)
return truncated
elif isinstance(obj, list):
truncated = []
for i, item in enumerate(obj[:3]):
truncated.append(truncate_for_display(item, max_depth, current_depth + 1))
if len(obj) > 3:
truncated.append(f"... ({len(obj) - 3} more items)")
return truncated
else:
if isinstance(obj, str) and len(obj) > 50:
return obj[:50] + "..."
return obj
You don't need to memorise how this helper works internally. Keep it in your toolkit and adapt it as you go -- the value is in the exploration strategy it demonstrates, not the specific implementation. What matters is the workflow: run the helper against a new endpoint, read what it prints, then write extraction code that matches the shape you actually see.
Testing the helper against two different GitHub endpoints
GitHub is ideal for showing how the same API can return dramatically different shapes from different endpoints. Save the call script below as test_discovery.py at the project root:
from explore_api_structure import explore_api_structure
print("=== Comparing Different API Response Structures ===\n")
# GitHub API: direct object response
explore_api_structure("https://api.github.com/repos/octocat/Hello-World")
print("\n")
# GitHub Search API: uses an 'items' array
explore_api_structure("https://api.github.com/search/repositories?q=python&per_page=2")
Run it from the project root:
python test_discovery.py
You should see output along these lines. GitHub's live counts change over time, so your exact key counts and search totals may differ; the important signal is the response shape.
=== Comparing Different API Response Structures ===
API Response Analysis for: https://api.github.com/repos/octocat/Hello-World
============================================================
Response type: dict
Top-level keys: ['id', 'node_id', 'name', 'full_name', 'private', 'owner', '...']
Total keys: 79
Possible data containers found: []
First few fields (with truncated values):
id: 1296269
node_id: MDEwOlJlcG9zaXRvcnkxMjk2MjY5
name: Hello-World
full_name: octocat/Hello-World
private: False
Sample structure (truncated for readability):
{
"id": 1296269,
"node_id": "MDEwOlJlcG9zaXRvcnkxMjk2MjY5",
"name": "Hello-World",
"full_name": "octocat/Hello-World",
"private": false,
"...": "(74 more keys)"
}
API Response Analysis for: https://api.github.com/search/repositories?q=python&per_page=2
============================================================
Response type: dict
Top-level keys: ['total_count', 'incomplete_results', 'items']
Total keys: 3
Possible data containers found: ['items']
First few fields (with truncated values):
total_count: 8937004
incomplete_results: False
items: list (length: 2)
Sample structure (truncated for readability):
{
"total_count": 8937004,
"incomplete_results": false,
"items": [
"...",
"..."
]
}
The helper immediately reveals the structural difference between these two endpoints. The single-repository endpoint returns many fields directly at the root level -- no wrapper, no metadata, just the repository object. The search endpoint takes a completely different approach: it wraps the repository data in an items array and adds metadata like total_count for pagination.
This is why structural assumptions are dangerous. Code that expected a results array (the Chapter 6 Random User convention) would fail on both endpoints -- one has no wrapper at all, the other uses items. The helper's container-detection step spots these patterns for you, so you know exactly where the data lives before you write a single line of access code. Even within one API, different endpoints follow different shapes. That's normal. The exploration step is what makes it manageable.
Discovery is half the job. The other half is writing access code that survives the variation it reveals -- a single utility that works whether the data sits at the root, in an items array, or wrapped under data. That's the next page.