5. Debugging JSON parsing problems
Defensive extraction prevents crashes, but it doesn't tell you why the data looked wrong. When KeyError: 'name' shows up in a stack trace or your extractor keeps returning None, the first instinct is usually to guess, "maybe they renamed the field?". A faster instinct is to look. This section builds two reusable inspection tools that answer "what did the API actually return?" and "exactly where along this path did I lose it?", plus a short workflow to pull it all together.
A reusable inspection toolkit
The first tool is a function that takes a URL, fetches it, and prints a structured breakdown of the response: HTTP status, headers, raw text, top-level type, keys, and a pretty-printed first result. It's verbose on purpose -- the whole point is to see everything in one place instead of stepping through five print() calls. Save this as debug_toolkit.py:
import requests
import json
def debug_json_response(url):
"""
Comprehensive JSON response debugging.
Shows exactly what the API returned.
"""
print(f"=== Debugging Response from {url} ===\n")
try:
response = requests.get(url, timeout=10)
# Step 1: Check HTTP basics
print("1. HTTP Response Status:")
print(f" Status Code: {response.status_code}")
print(f" Reason: {response.reason}")
print()
# Step 2: Check headers
print("2. Important Headers:")
print(f" Content-Type: {response.headers.get('Content-Type', 'Not specified')}")
print(f" Content-Length: {response.headers.get('Content-Length', 'Not specified')}")
print()
# Step 3: Show raw text (first 500 chars)
print("3. Raw Response (first 500 characters):")
print(f" {response.text[:500]}")
if len(response.text) > 500:
print(f" ... ({len(response.text) - 500} more characters)")
print()
# Step 4: Try parsing as JSON
print("4. JSON Parsing:")
try:
data = response.json()
print(" OK: Successfully parsed as JSON")
print()
# Step 5: Show structure
print("5. Top-Level Structure:")
print(f" Type: {type(data).__name__}")
if isinstance(data, dict):
print(f" Keys: {list(data.keys())}")
print()
# Show details about each key
print("6. Key Details:")
for key, value in data.items():
value_type = type(value).__name__
if isinstance(value, list):
print(f" '{key}': list with {len(value)} items")
if len(value) > 0:
print(f" First item type: {type(value[0]).__name__}")
elif isinstance(value, dict):
print(f" '{key}': dict with keys {list(value.keys())}")
else:
# Show value for primitives
value_str = str(value)[:50]
print(f" '{key}': {value_type} = {value_str}")
elif isinstance(data, list):
print(f" Array with {len(data)} items")
if len(data) > 0:
print(f" First item type: {type(data[0]).__name__}")
if isinstance(data[0], dict):
print(f" First item keys: {list(data[0].keys())}")
print()
# Step 6: Pretty-print first item if it's a list
if isinstance(data, dict) and "results" in data:
results = data.get("results", [])
if isinstance(results, list) and len(results) > 0:
print("7. First Result (Pretty-Printed):")
print(json.dumps(results[0], indent=2))
except ValueError as e:
print(f" ERROR: Failed to parse as JSON: {e}")
print(f" Response might not be valid JSON")
except requests.exceptions.RequestException as e:
print(f"ERROR: Request failed: {e}")
# Debug a real API
debug_json_response("https://randomuser.me/api/")
Run it from the project root against the Random User API. The sample person and content length may differ, but the structural summary should have the same shape:
$ python debug_toolkit.py
=== Debugging Response from https://randomuser.me/api/ ===
1. HTTP Response Status:
Status Code: 200
Reason: OK
2. Important Headers:
Content-Type: application/json; charset=utf-8
Content-Length: 1847
3. Raw Response (first 500 characters):
{"results":[{"gender":"female","name":{"title":"Ms","first":"Emma","last":"Johnson"},"location":{"street":{"number":1234,"name":"Main St"},"city":"Auckland","state":"Auckland","country":"New Zealand","postcode":"1010","coordinates":{"latitude":"-36.8485","longitude":"174.7633"},"timezone":{"offset":"+12:00","description":"Auckland, Wellington"}},"email":"emma.johnson@example.com","login":{"uuid":"a1b2c3d4","username":"bluefox123","password":"password123","salt":"xyz"
... (1347 more characters)
4. JSON Parsing:
OK: Successfully parsed as JSON
5. Top-Level Structure:
Type: dict
Keys: ['results', 'info']
6. Key Details:
'results': list with 1 items
First item type: dict
'info': dict with keys ['seed', 'results', 'page', 'version']
7. First Result (Pretty-Printed):
{
"gender": "female",
"name": {
"title": "Ms",
"first": "Emma",
"last": "Johnson"
},
"location": {
"street": {
"number": 1234,
"name": "Main St"
},
"city": "Auckland",
"state": "Auckland",
"country": "New Zealand",
"postcode": "1010",
"coordinates": {
"latitude": "-36.8485",
"longitude": "174.7633"
},
"timezone": {
"offset": "+12:00",
"description": "Auckland, Wellington"
}
},
"email": "emma.johnson@example.com",
"login": { ... },
"dob": {
"date": "1989-12-15T10:30:00.000Z",
"age": 34
},
... ("registered", "phone", "cell", "id", "picture", "nat" follow)
}
The output covers the five things you actually need when debugging: the HTTP status (to distinguish "the request failed" from "the response is wrong"), the Content-Type header (to catch the case where a JSON endpoint returned an HTML error page), the raw text (to spot encoding issues), the top-level structure map (to see actual keys against your mental model), and a pretty-printed sample (so you can read nested shapes directly). The sample above is trimmed for the page; json.dumps(results[0], indent=2) prints every key, so your own run shows the full login, registered, phone, and other fields in place of the elisions. Drop debug_toolkit.py into a utilities folder and reuse it whenever a new API refuses to behave.
A cheat sheet for common crashes
Most JSON parsing problems fall into a small number of shapes, and each has a diagnostic signature and a specific fix. The Layer column ties each crash type back to the three-layer pattern from page 3: every failure is an existence, type, or content failure, which tells you which defensive technique to reach for:
| Problem | Layer | Symptom | Diagnosis | Solution |
|---|---|---|---|---|
| KeyError | Existence | KeyError: 'results' |
Key doesn't exist in response | Use .get("results", []) with default |
| IndexError | Existence | list index out of range |
Array is empty or shorter than expected | Check len(array) > 0 before accessing |
| TypeError on None | Type | 'NoneType' object is not subscriptable |
Field is null, not nested object | Check if value and isinstance(value, dict) |
| AttributeError | Type | 'NoneType' has no attribute 'upper' |
Calling string method on None | Check isinstance(value, str) before string operations |
| ValueError | Content | invalid literal for int() |
Converting non-numeric string to int | Use .isdigit() check before int() |
| Wrong data type | Type | Silent failures or unexpected results | Field is string when expecting int, or vice versa | Add isinstance() type checking |
Walking a failing path, step by step
The cheat sheet tells you what went wrong; this next tool tells you where. It takes a list of keys and indices, a path like ["results", 0, "name", "first"], and walks through the response one step at a time, reporting the type and availability at each hop. When a step fails, it names the failing key and lists what is available at that level. That one feature, "available keys at the failure site", saves more debugging time than any other single technique.
Save this as debug_workflow.py:
import requests
def diagnose_extraction_problem(url, path_to_value):
"""
Debug why extracting a specific value fails.
Args:
url: API endpoint to test
path_to_value: List of keys to traverse, e.g., ["results", 0, "name", "first"]
"""
print(f"=== Diagnosing Path: {' -> '.join(str(p) for p in path_to_value)} ===\n")
try:
# Fetch data
response = requests.get(url, timeout=10)
response.raise_for_status()
data = response.json()
# Traverse path step by step
current = data
for i, key in enumerate(path_to_value):
path_so_far = ' -> '.join(str(p) for p in path_to_value[:i+1])
print(f"Step {i+1}: Accessing '{key}'")
print(f" Current type: {type(current).__name__}")
# Check if we can access this key
if isinstance(current, dict):
if key in current:
current = current[key]
print(f" OK: Key '{key}' exists")
print(f" Value type: {type(current).__name__}")
# Show value if it's simple
if not isinstance(current, (dict, list)):
value_str = str(current)[:100]
print(f" Value: {value_str}")
else:
print(f" ERROR: Key '{key}' DOES NOT EXIST")
print(f" Available keys: {list(current.keys())}")
return
elif isinstance(current, list):
if isinstance(key, int):
list_length = len(current)
if 0 <= key < list_length:
current = current[key]
print(f" OK: Index {key} exists (list has {list_length} items)")
print(f" Value type: {type(current).__name__}")
else:
print(f" ERROR: Index {key} OUT OF RANGE")
print(f" List length: {list_length}")
return
else:
print(f" ERROR: Cannot use key '{key}' on list")
print(f" Use numeric index instead")
return
else:
print(f" ERROR: Cannot access '{key}' on {type(current).__name__}")
print(f" Expected dict or list, got {type(current).__name__}")
return
print()
print(f"OK: Successfully reached final value:")
print(f" Type: {type(current).__name__}")
print(f" Value: {current}")
except requests.exceptions.RequestException as e:
print(f"ERROR: Request failed: {e}")
except ValueError as e:
print(f"ERROR: JSON parsing failed: {e}")
# Test various paths
print("TEST 1: Valid path")
diagnose_extraction_problem(
"https://randomuser.me/api/",
["results", 0, "name", "first"]
)
print("\n" + "="*60 + "\n")
print("TEST 2: Invalid key")
diagnose_extraction_problem(
"https://randomuser.me/api/",
["results", 0, "fullname"] # Wrong key name
)
print("\n" + "="*60 + "\n")
print("TEST 3: Out of bounds index")
diagnose_extraction_problem(
"https://randomuser.me/api/",
["results", 5, "name"] # Index too large
)
Run it from the project root. Random User returns a different person each time, so the final first name may differ; the path checks and list lengths should match this shape:
$ python debug_workflow.py
TEST 1: Valid path
=== Diagnosing Path: results -> 0 -> name -> first ===
Step 1: Accessing 'results'
Current type: dict
OK: Key 'results' exists
Value type: list
Step 2: Accessing '0'
Current type: list
OK: Index 0 exists (list has 1 items)
Value type: dict
Step 3: Accessing 'name'
Current type: dict
OK: Key 'name' exists
Value type: dict
Step 4: Accessing 'first'
Current type: dict
OK: Key 'first' exists
Value type: str
Value: Emma
OK: Successfully reached final value:
Type: str
Value: Emma
============================================================
TEST 2: Invalid key
=== Diagnosing Path: results -> 0 -> fullname ===
Step 1: Accessing 'results'
Current type: dict
OK: Key 'results' exists
Value type: list
Step 2: Accessing '0'
Current type: list
OK: Index 0 exists (list has 1 items)
Value type: dict
Step 3: Accessing 'fullname'
Current type: dict
ERROR: Key 'fullname' DOES NOT EXIST
Available keys: ['gender', 'name', 'location', 'email', 'login', 'dob', 'registered', 'phone', 'cell', 'id', 'picture', 'nat']
============================================================
TEST 3: Out of bounds index
=== Diagnosing Path: results -> 5 -> name ===
Step 1: Accessing 'results'
Current type: dict
OK: Key 'results' exists
Value type: list
Step 2: Accessing '5'
Current type: list
ERROR: Index 5 OUT OF RANGE
List length: 1
Notice what Test 2 reports when it fails: Key 'fullname' DOES NOT EXIST. Available keys: ['gender', 'name', 'location', ...]. That single line resolves ninety percent of "my code worked yesterday" bugs: either the field has a different name than you thought, or the API actually renamed it. You compare your path against the available keys and the fix becomes obvious.
A six-step workflow for when extraction breaks
When a stack trace lands and you need to get unstuck quickly, run these steps in order rather than guessing. Each step narrows the problem down, so by the time you hit step 5 you know exactly which defensive technique to add.
- Verify the response arrives. Check the status code is 200, the Content-Type is JSON, and the response isn't empty.
print(response.status_code, response.headers)in one line catches "the request is returning HTML" and "the server is returning 500" without digging further. - Inspect the structure. Run
debug_toolkit.pyagainst the URL. Compare the actual keys against what your code expects. - Identify the mismatch. Look for renamed keys, different nesting levels, arrays where you expected objects, null values where you expected nested dicts.
- Test the path. Use
debug_workflow.pyto walk the exact access path that's failing. It'll name the failing step and list what's available at that level. - Add the defensive fix. Match the crash type to a technique:
.get()with default for missing keys,isinstance()for wrong types, length check for array access, explicitis Nonefor null values. - Test the edge cases. Empty arrays, null values, missing keys, wrong types. Run the test cases from earlier sections through your extractor to make sure it handles all four.
The meta-habit is simpler: don't guess, inspect. Keep your debugging functions in a utilities module and run them first when you're working with a new API, before writing any extraction code. That one change, inspect before you extract, prevents most of the "documentation says X but the response is Y" bugs this chapter has been about.