2. Discovering API response structures

You can't normalize a shape you haven't seen, so before writing extraction code against an unfamiliar API, the first job is to reveal what the response actually contains: where the list of records lives, how deep the nesting goes, which keys show up and which don't. This page builds the diagnostic helper you'll reach for on every new endpoint from here on.

The toolkit this chapter builds uses GitHub's public API for teaching. It's real, queryable without authentication, and the two endpoints we hit have dramatically different shapes -- perfect for showing the patterns. Once you've internalised the moves on GitHub, you'll reuse them on the vendor orders challenge when we build the complete normalizer at the end of the chapter.

Guessing an API's structure is one of the fastest ways to ship brittle code. In Chapter 6, the Random User API wrapped records in a results array, but that's just one convention. GitHub's single-repository endpoint returns the object directly at the root. Many search endpoints wrap results in an items array. Other APIs use data or payload. The professional rule: verify the shape first, write extraction code second.

A systematic exploration approach

Rather than manually inspecting JSON responses and getting lost in nested complexity, professional developers build diagnostic tools that automate the exploration process. Those tools do the tedious work -- traversing structures, counting elements, identifying container patterns, truncating sprawling output into a readable summary -- so you can see the shape at a glance and write the right extraction code the first time.

The helper below combines four techniques: response-type detection for objects and arrays, pattern recognition for common container keys, intelligent truncation so deeply nested responses stay readable, and structured analysis that reveals the key characteristics without dumping every field. Save it as explore_api_structure.py at the project root:

explore_api_structure.py
import requests
import json

def explore_api_structure(url, max_depth=2):
    """
    Systematically explore an API response structure.
    This should be your first step with any new API.
    """
    try:
        response = requests.get(url, timeout=10)
        response.raise_for_status()
        data = response.json()

        print(f"API Response Analysis for: {url}")
        print("=" * 60)
        print(f"Response type: {type(data).__name__}")

        if isinstance(data, dict):
            keys = list(data.keys())
            preview = keys[:6] + ['...'] if len(keys) > 6 else keys
            print(f"Top-level keys: {preview}")
            print(f"Total keys: {len(data)}")

            # Look for common data container patterns
            common_containers = ['items', 'results', 'data', 'content', 'entries', 'records']
            found_containers = [key for key in common_containers if key in data]
            if found_containers:
                print(f"Possible data containers found: {found_containers}")

            # Show structure of first few fields
            print("\nFirst few fields (with truncated values):")
            for i, (key, value) in enumerate(list(data.items())[:5]):
                value_type = type(value).__name__
                if isinstance(value, (dict, list)):
                    size_info = f" (length: {len(value)})" if hasattr(value, '__len__') else ""
                    print(f"  {key}: {value_type}{size_info}")
                else:
                    str_value = str(value)
                    display_value = str_value[:50] + "..." if len(str_value) > 50 else str_value
                    print(f"  {key}: {display_value}")

        elif isinstance(data, list):
            print(f"Array with {len(data)} items")
            if data:
                first_item = data[0]
                print(f"First item type: {type(first_item).__name__}")
                if isinstance(first_item, dict):
                    print(f"First item keys: {list(first_item.keys())}")

        print("\nSample structure (truncated for readability):")
        print(json.dumps(truncate_for_display(data, max_depth), indent=2))

    except requests.RequestException as e:
        print(f"Error fetching data: {e}")
    except json.JSONDecodeError as e:
        print(f"Error parsing JSON: {e}")

def truncate_for_display(obj, max_depth=2, current_depth=0):
    """Helper function to truncate nested data for readable display."""
    if current_depth >= max_depth:
        return "..."

    if isinstance(obj, dict):
        truncated = {}
        for i, (key, value) in enumerate(obj.items()):
            if i >= 5:
                truncated["..."] = f"({len(obj) - 5} more keys)"
                break
            truncated[key] = truncate_for_display(value, max_depth, current_depth + 1)
        return truncated

    elif isinstance(obj, list):
        truncated = []
        for i, item in enumerate(obj[:3]):
            truncated.append(truncate_for_display(item, max_depth, current_depth + 1))
        if len(obj) > 3:
            truncated.append(f"... ({len(obj) - 3} more items)")
        return truncated

    else:
        if isinstance(obj, str) and len(obj) > 50:
            return obj[:50] + "..."
        return obj

You don't need to memorise how this helper works internally. Keep it in your toolkit and adapt it as you go -- the value is in the exploration strategy it demonstrates, not the specific implementation. What matters is the workflow: run the helper against a new endpoint, read what it prints, then write extraction code that matches the shape you actually see.

Testing the helper against two different GitHub endpoints

GitHub is ideal for showing how the same API can return dramatically different shapes from different endpoints. Save the call script below as test_discovery.py at the project root:

test_discovery.py
from explore_api_structure import explore_api_structure

print("=== Comparing Different API Response Structures ===\n")

# GitHub API: direct object response
explore_api_structure("https://api.github.com/repos/octocat/Hello-World")
print("\n")

# GitHub Search API: uses an 'items' array
explore_api_structure("https://api.github.com/search/repositories?q=python&per_page=2")

Run it from the project root:

Terminal
python test_discovery.py

You should see output along these lines. GitHub's live counts change over time, so your exact key counts and search totals may differ; the important signal is the response shape.

Terminal
=== Comparing Different API Response Structures ===

API Response Analysis for: https://api.github.com/repos/octocat/Hello-World
============================================================
Response type: dict
Top-level keys: ['id', 'node_id', 'name', 'full_name', 'private', 'owner', '...']
Total keys: 79
Possible data containers found: []

First few fields (with truncated values):
  id: 1296269
  node_id: MDEwOlJlcG9zaXRvcnkxMjk2MjY5
  name: Hello-World
  full_name: octocat/Hello-World
  private: False

Sample structure (truncated for readability):
{
  "id": 1296269,
  "node_id": "MDEwOlJlcG9zaXRvcnkxMjk2MjY5",
  "name": "Hello-World",
  "full_name": "octocat/Hello-World",
  "private": false,
  "...": "(74 more keys)"
}

API Response Analysis for: https://api.github.com/search/repositories?q=python&per_page=2
============================================================
Response type: dict
Top-level keys: ['total_count', 'incomplete_results', 'items']
Total keys: 3
Possible data containers found: ['items']

First few fields (with truncated values):
  total_count: 8937004
  incomplete_results: False
  items: list (length: 2)

Sample structure (truncated for readability):
{
  "total_count": 8937004,
  "incomplete_results": false,
  "items": [
    "...",
    "..."
  ]
}

The helper immediately reveals the structural difference between these two endpoints. The single-repository endpoint returns many fields directly at the root level -- no wrapper, no metadata, just the repository object. The search endpoint takes a completely different approach: it wraps the repository data in an items array and adds metadata like total_count for pagination.

This is why structural assumptions are dangerous. Code that expected a results array (the Chapter 6 Random User convention) would fail on both endpoints -- one has no wrapper at all, the other uses items. The helper's container-detection step spots these patterns for you, so you know exactly where the data lives before you write a single line of access code. Even within one API, different endpoints follow different shapes. That's normal. The exploration step is what makes it manageable.

Discovery is half the job. The other half is writing access code that survives the variation it reveals -- a single utility that works whether the data sits at the root, in an items array, or wrapped under data. That's the next page.