5. Debugging JSON parsing problems

Defensive extraction prevents crashes, but it doesn't tell you why the data looked wrong. When KeyError: 'name' shows up in a stack trace or your extractor keeps returning None, the first instinct is usually to guess, "maybe they renamed the field?". A faster instinct is to look. This section builds two reusable inspection tools that answer "what did the API actually return?" and "exactly where along this path did I lose it?", plus a short workflow to pull it all together.

A reusable inspection toolkit

The first tool is a function that takes a URL, fetches it, and prints a structured breakdown of the response: HTTP status, headers, raw text, top-level type, keys, and a pretty-printed first result. It's verbose on purpose -- the whole point is to see everything in one place instead of stepping through five print() calls. Save this as debug_toolkit.py:

debug_toolkit.py
import requests
import json

def debug_json_response(url):
    """
    Comprehensive JSON response debugging.
    Shows exactly what the API returned.
    """
    print(f"=== Debugging Response from {url} ===\n")
    
    try:
        response = requests.get(url, timeout=10)
        
        # Step 1: Check HTTP basics
        print("1. HTTP Response Status:")
        print(f"   Status Code: {response.status_code}")
        print(f"   Reason: {response.reason}")
        print()
        
        # Step 2: Check headers
        print("2. Important Headers:")
        print(f"   Content-Type: {response.headers.get('Content-Type', 'Not specified')}")
        print(f"   Content-Length: {response.headers.get('Content-Length', 'Not specified')}")
        print()
        
        # Step 3: Show raw text (first 500 chars)
        print("3. Raw Response (first 500 characters):")
        print(f"   {response.text[:500]}")
        if len(response.text) > 500:
            print(f"   ... ({len(response.text) - 500} more characters)")
        print()
        
        # Step 4: Try parsing as JSON
        print("4. JSON Parsing:")
        try:
            data = response.json()
            print("   OK: Successfully parsed as JSON")
            print()
            
            # Step 5: Show structure
            print("5. Top-Level Structure:")
            print(f"   Type: {type(data).__name__}")
            
            if isinstance(data, dict):
                print(f"   Keys: {list(data.keys())}")
                print()
                
                # Show details about each key
                print("6. Key Details:")
                for key, value in data.items():
                    value_type = type(value).__name__
                    
                    if isinstance(value, list):
                        print(f"   '{key}': list with {len(value)} items")
                        if len(value) > 0:
                            print(f"           First item type: {type(value[0]).__name__}")
                    elif isinstance(value, dict):
                        print(f"   '{key}': dict with keys {list(value.keys())}")
                    else:

                        # Show value for primitives
                        value_str = str(value)[:50]
                        print(f"   '{key}': {value_type} = {value_str}")
            
            elif isinstance(data, list):
                print(f"   Array with {len(data)} items")
                if len(data) > 0:
                    print(f"   First item type: {type(data[0]).__name__}")
                    if isinstance(data[0], dict):
                        print(f"   First item keys: {list(data[0].keys())}")
            
            print()
            
            # Step 6: Pretty-print first item if it's a list
            if isinstance(data, dict) and "results" in data:
                results = data.get("results", [])
                if isinstance(results, list) and len(results) > 0:
                    print("7. First Result (Pretty-Printed):")
                    print(json.dumps(results[0], indent=2))
        
        except ValueError as e:
            print(f"   ERROR: Failed to parse as JSON: {e}")
            print(f"   Response might not be valid JSON")
    
    except requests.exceptions.RequestException as e:
        print(f"ERROR: Request failed: {e}")

# Debug a real API
debug_json_response("https://randomuser.me/api/")

Run it from the project root against the Random User API. The sample person and content length may differ, but the structural summary should have the same shape:

Terminal
$ python debug_toolkit.py
=== Debugging Response from https://randomuser.me/api/ ===

1. HTTP Response Status:
   Status Code: 200
   Reason: OK

2. Important Headers:
   Content-Type: application/json; charset=utf-8
   Content-Length: 1847

3. Raw Response (first 500 characters):
   {"results":[{"gender":"female","name":{"title":"Ms","first":"Emma","last":"Johnson"},"location":{"street":{"number":1234,"name":"Main St"},"city":"Auckland","state":"Auckland","country":"New Zealand","postcode":"1010","coordinates":{"latitude":"-36.8485","longitude":"174.7633"},"timezone":{"offset":"+12:00","description":"Auckland, Wellington"}},"email":"emma.johnson@example.com","login":{"uuid":"a1b2c3d4","username":"bluefox123","password":"password123","salt":"xyz"
   ... (1347 more characters)

4. JSON Parsing:
   OK: Successfully parsed as JSON

5. Top-Level Structure:
   Type: dict
   Keys: ['results', 'info']

6. Key Details:
   'results': list with 1 items
           First item type: dict
   'info': dict with keys ['seed', 'results', 'page', 'version']

7. First Result (Pretty-Printed):
{
  "gender": "female",
  "name": {
    "title": "Ms",
    "first": "Emma",
    "last": "Johnson"
  },
  "location": {
    "street": {
      "number": 1234,
      "name": "Main St"
    },
    "city": "Auckland",
    "state": "Auckland",
    "country": "New Zealand",
    "postcode": "1010",
    "coordinates": {
      "latitude": "-36.8485",
      "longitude": "174.7633"
    },
    "timezone": {
      "offset": "+12:00",
      "description": "Auckland, Wellington"
    }
  },
  "email": "emma.johnson@example.com",
  "login": { ... },
  "dob": {
    "date": "1989-12-15T10:30:00.000Z",
    "age": 34
  },
  ... ("registered", "phone", "cell", "id", "picture", "nat" follow)
}

The output covers the five things you actually need when debugging: the HTTP status (to distinguish "the request failed" from "the response is wrong"), the Content-Type header (to catch the case where a JSON endpoint returned an HTML error page), the raw text (to spot encoding issues), the top-level structure map (to see actual keys against your mental model), and a pretty-printed sample (so you can read nested shapes directly). The sample above is trimmed for the page; json.dumps(results[0], indent=2) prints every key, so your own run shows the full login, registered, phone, and other fields in place of the elisions. Drop debug_toolkit.py into a utilities folder and reuse it whenever a new API refuses to behave.

A cheat sheet for common crashes

Most JSON parsing problems fall into a small number of shapes, and each has a diagnostic signature and a specific fix. The Layer column ties each crash type back to the three-layer pattern from page 3: every failure is an existence, type, or content failure, which tells you which defensive technique to reach for:

Problem Layer Symptom Diagnosis Solution
KeyError Existence KeyError: 'results' Key doesn't exist in response Use .get("results", []) with default
IndexError Existence list index out of range Array is empty or shorter than expected Check len(array) > 0 before accessing
TypeError on None Type 'NoneType' object is not subscriptable Field is null, not nested object Check if value and isinstance(value, dict)
AttributeError Type 'NoneType' has no attribute 'upper' Calling string method on None Check isinstance(value, str) before string operations
ValueError Content invalid literal for int() Converting non-numeric string to int Use .isdigit() check before int()
Wrong data type Type Silent failures or unexpected results Field is string when expecting int, or vice versa Add isinstance() type checking

Walking a failing path, step by step

The cheat sheet tells you what went wrong; this next tool tells you where. It takes a list of keys and indices, a path like ["results", 0, "name", "first"], and walks through the response one step at a time, reporting the type and availability at each hop. When a step fails, it names the failing key and lists what is available at that level. That one feature, "available keys at the failure site", saves more debugging time than any other single technique.

Save this as debug_workflow.py:

debug_workflow.py
import requests

def diagnose_extraction_problem(url, path_to_value):
    """
    Debug why extracting a specific value fails.
    
    Args:
        url: API endpoint to test
        path_to_value: List of keys to traverse, e.g., ["results", 0, "name", "first"]
    """
    print(f"=== Diagnosing Path: {' -> '.join(str(p) for p in path_to_value)} ===\n")
    
    try:

        # Fetch data
        response = requests.get(url, timeout=10)
        response.raise_for_status()
        
        data = response.json()
        
        # Traverse path step by step
        current = data
        
        for i, key in enumerate(path_to_value):
            path_so_far = ' -> '.join(str(p) for p in path_to_value[:i+1])
            
            print(f"Step {i+1}: Accessing '{key}'")
            print(f"  Current type: {type(current).__name__}")
            
            # Check if we can access this key
            if isinstance(current, dict):
                if key in current:
                    current = current[key]
                    print(f"  OK: Key '{key}' exists")
                    print(f"  Value type: {type(current).__name__}")
                    
                    # Show value if it's simple
                    if not isinstance(current, (dict, list)):
                        value_str = str(current)[:100]
                        print(f"  Value: {value_str}")
                else:
                    print(f"  ERROR: Key '{key}' DOES NOT EXIST")
                    print(f"  Available keys: {list(current.keys())}")
                    return
            
            elif isinstance(current, list):
                if isinstance(key, int):
                    list_length = len(current)
                    if 0 <= key < list_length:
                        current = current[key]
                        print(f"  OK: Index {key} exists (list has {list_length} items)")
                        print(f"  Value type: {type(current).__name__}")
                    else:
                        print(f"  ERROR: Index {key} OUT OF RANGE")
                        print(f"  List length: {list_length}")
                        return
                else:
                    print(f"  ERROR: Cannot use key '{key}' on list")
                    print(f"  Use numeric index instead")
                    return
            
            else:
                print(f"  ERROR: Cannot access '{key}' on {type(current).__name__}")
                print(f"  Expected dict or list, got {type(current).__name__}")
                return
            
            print()
        
        print(f"OK: Successfully reached final value:")
        print(f"   Type: {type(current).__name__}")
        print(f"   Value: {current}")
    
    except requests.exceptions.RequestException as e:
        print(f"ERROR: Request failed: {e}")
    except ValueError as e:
        print(f"ERROR: JSON parsing failed: {e}")

# Test various paths
print("TEST 1: Valid path")
diagnose_extraction_problem(
    "https://randomuser.me/api/",
    ["results", 0, "name", "first"]
)

print("\n" + "="*60 + "\n")

print("TEST 2: Invalid key")
diagnose_extraction_problem(
    "https://randomuser.me/api/",
    ["results", 0, "fullname"]  # Wrong key name
)

print("\n" + "="*60 + "\n")

print("TEST 3: Out of bounds index")
diagnose_extraction_problem(
    "https://randomuser.me/api/",
    ["results", 5, "name"]  # Index too large
)

Run it from the project root. Random User returns a different person each time, so the final first name may differ; the path checks and list lengths should match this shape:

Terminal
$ python debug_workflow.py
TEST 1: Valid path
=== Diagnosing Path: results -> 0 -> name -> first ===

Step 1: Accessing 'results'
  Current type: dict
  OK: Key 'results' exists
  Value type: list

Step 2: Accessing '0'
  Current type: list
  OK: Index 0 exists (list has 1 items)
  Value type: dict

Step 3: Accessing 'name'
  Current type: dict
  OK: Key 'name' exists
  Value type: dict

Step 4: Accessing 'first'
  Current type: dict
  OK: Key 'first' exists
  Value type: str
  Value: Emma

OK: Successfully reached final value:
   Type: str
   Value: Emma

============================================================

TEST 2: Invalid key
=== Diagnosing Path: results -> 0 -> fullname ===

Step 1: Accessing 'results'
  Current type: dict
  OK: Key 'results' exists
  Value type: list

Step 2: Accessing '0'
  Current type: list
  OK: Index 0 exists (list has 1 items)
  Value type: dict

Step 3: Accessing 'fullname'
  Current type: dict
  ERROR: Key 'fullname' DOES NOT EXIST
  Available keys: ['gender', 'name', 'location', 'email', 'login', 'dob', 'registered', 'phone', 'cell', 'id', 'picture', 'nat']

============================================================

TEST 3: Out of bounds index
=== Diagnosing Path: results -> 5 -> name ===

Step 1: Accessing 'results'
  Current type: dict
  OK: Key 'results' exists
  Value type: list

Step 2: Accessing '5'
  Current type: list
  ERROR: Index 5 OUT OF RANGE
  List length: 1

Notice what Test 2 reports when it fails: Key 'fullname' DOES NOT EXIST. Available keys: ['gender', 'name', 'location', ...]. That single line resolves ninety percent of "my code worked yesterday" bugs: either the field has a different name than you thought, or the API actually renamed it. You compare your path against the available keys and the fix becomes obvious.

A six-step workflow for when extraction breaks

When a stack trace lands and you need to get unstuck quickly, run these steps in order rather than guessing. Each step narrows the problem down, so by the time you hit step 5 you know exactly which defensive technique to add.

  1. Verify the response arrives. Check the status code is 200, the Content-Type is JSON, and the response isn't empty. print(response.status_code, response.headers) in one line catches "the request is returning HTML" and "the server is returning 500" without digging further.
  2. Inspect the structure. Run debug_toolkit.py against the URL. Compare the actual keys against what your code expects.
  3. Identify the mismatch. Look for renamed keys, different nesting levels, arrays where you expected objects, null values where you expected nested dicts.
  4. Test the path. Use debug_workflow.py to walk the exact access path that's failing. It'll name the failing step and list what's available at that level.
  5. Add the defensive fix. Match the crash type to a technique: .get() with default for missing keys, isinstance() for wrong types, length check for array access, explicit is None for null values.
  6. Test the edge cases. Empty arrays, null values, missing keys, wrong types. Run the test cases from earlier sections through your extractor to make sure it handles all four.

The meta-habit is simpler: don't guess, inspect. Keep your debugging functions in a utilities module and run them first when you're working with a new API, before writing any extraction code. That one change, inspect before you extract, prevents most of the "documentation says X but the response is Y" bugs this chapter has been about.