4. Working with arrays defensively
Up to now you've been extracting data from single objects: one user, one profile, one row. Most real APIs don't stop there. GitHub returns arrays of repositories, Spotify returns arrays of tracks, search endpoints of every kind return arrays of matches. Arrays add four new ways to crash: the array can be empty, it can be null instead of a list, the field can be missing entirely, or it can come back as the wrong type (a string that [0] happily indexes one character into). This section teaches the pattern that handles all four.
The shape difference looks like this, a single-object response versus an array-of-objects response. Most of the chapter so far has been about the first shape; everything below is about the second:
// Single object - what you've been working with:
{
"user": {
"name": "Alice",
"email": "alice@example.com"
}
}
// Array of objects - what most search/list APIs return:
{
"results": [
{
"name": "Alice",
"email": "alice@example.com"
},
{
"name": "Bob",
"email": "bob@example.com"
},
{
"name": "Charlie",
"email": "charlie@example.com"
}
]
}
Any time you see /search or /list or "get multiple X" in an API's docs, the response will be an array. And arrays break in their own specific ways: length varies (zero, one, thousands), an empty result is a valid non-error response, each item inside the list can have different fields present, and the naive results[0] access raises IndexError the moment the list is empty.
The four ways an array access crashes
The most common array-related crash is accessing an index that doesn't exist, but there are subtler failure modes too. Save this as array_crashes.py:
# Scenario 1: Empty array
response_data_1 = {"results": []}
try:
first_user = response_data_1["results"][0] # IndexError!
print(first_user)
except IndexError as e:
print(f"ERROR: Empty array: {e}")
# Scenario 2: Null instead of array
response_data_2 = {"results": None}
try:
first_user = response_data_2["results"][0] # TypeError!
print(first_user)
except TypeError as e:
print(f"ERROR: Null array: {e}")
# Scenario 3: Missing key
response_data_3 = {}
try:
first_user = response_data_3["results"][0] # KeyError then won't even get to IndexError
print(first_user)
except KeyError as e:
print(f"ERROR: Missing key: {e}")
# Scenario 4: Wrong type (string instead of array)
response_data_4 = {"results": "no results found"}
try:
first_user = response_data_4["results"][0] # Returns 'n' (first char)!
print(f"Got: {first_user}") # Silently wrong!
except Exception as e:
print(f"ERROR: Error: {e}")
Run it from the project root:
$ python array_crashes.py
ERROR: Empty array: list index out of range
ERROR: Null array: 'NoneType' object is not subscriptable
ERROR: Missing key: 'results'
Got: n
The first three fail loudly. The fourth is the dangerous one: when results is a string instead of a list, [0] returns the first character without raising anything. "n" is now silently flowing downstream as if it were a user object, and every .get() call on it will explode later in a line of code that doesn't look obviously related. This is why the defensive pattern needs an explicit isinstance(..., list) check, not just a length check.
Safe array access
Before accessing any index, three checks need to pass: (1) the field exists, (2) it's a list, (3) it has enough items. Skip any one of those and you'll hit one of the four crashes above. Here's the pattern wrapped in a function that returns a (success, result) tuple so the caller can react to failure without a try/except.
Save this as safe_arrays.py:
def get_first_user_safely(data):
"""Extract first user with complete validation."""
# Step 1: Get the results field with default
results = data.get("results")
# Step 2: Validate it's actually a list
if not isinstance(results, list):
return (False, "Results is not a list")
# Step 3: Check list is not empty
if len(results) == 0:
return (False, "No results found")
# Step 4: Safe to access first item
first_user = results[0]
# Step 5: Validate the item is a dict (expected structure)
if not isinstance(first_user, dict):
return (False, "First result is not a dictionary")
return (True, first_user)
# Test with various response formats
test_responses = [
{"results": [{"name": "Alice"}, {"name": "Bob"}]}, # Normal
{"results": []}, # Empty array
{"results": None}, # Null
{}, # Missing key
{"results": "no data"}, # Wrong type
{"results": [None, {"name": "Bob"}]}, # First item is null
]
print("=== Safe Array Access Tests ===\n")
for i, test_data in enumerate(test_responses, 1):
success, result = get_first_user_safely(test_data)
if success:
print(f"Test {i}: OK: Got user: {result}")
else:
print(f"Test {i}: ERROR: {result}")
Run it from the project root:
$ python safe_arrays.py
=== Safe Array Access Tests ===
Test 1: OK: Got user: {'name': 'Alice'}
Test 2: ERROR: No results found
Test 3: ERROR: Results is not a list
Test 4: ERROR: Results is not a list
Test 5: ERROR: Results is not a list
Test 6: ERROR: First result is not a dictionary
Same five inputs as the crash demo, plus a well-formed one and one where the first item is None, and all six get a predictable, readable result. Returning (False, reason) instead of raising lets the caller choose, show a user-friendly message, retry with different parameters, or log and move on. Any time you find yourself writing try: users[0], reach for this shape instead.
Processing a whole array
One item is easy. A list of thirty users with one malformed record in the middle is where defensive loops earn their keep. The pattern is the same three-layer idea applied twice: validate the array once before the loop, and validate each item once inside. Crucially, catch per-item failures with continue rather than raising, so one bad record doesn't crash the whole batch, and keep a running count of successes and failures so you can tell whether the response was healthy or garbage.
Save this as batch_users.py. It imports the extract_user_with_type_validation function from type_safe_user.py, so keep that file in the same folder:
import requests
from type_safe_user import extract_user_with_type_validation
def fetch_and_process_users(count=10):
"""Fetch multiple users with defensive processing."""
try:
# Fetch data with timeout
response = requests.get(
f"https://randomuser.me/api/?results={count}",
timeout=10
)
response.raise_for_status()
# Validate content type
content_type = response.headers.get("Content-Type", "")
if "application/json" not in content_type:
print(f"ERROR: Expected JSON but received {content_type}")
return
# Parse JSON
try:
data = response.json()
except ValueError:
print("ERROR: Invalid JSON in response")
return
# Extract results array safely
results = data.get("results")
if not isinstance(results, list):
print("ERROR: Results is not a list")
return
if len(results) == 0:
print("INFO: No users found")
return
print(f"=== Processing {len(results)} Users ===\n")
# Process each user with defensive extraction
successful = 0
failed = 0
for i, user in enumerate(results, 1):
# Validate each item is a dictionary
if not isinstance(user, dict):
print(f"User {i}: ERROR: Invalid data structure")
failed += 1
continue
# Extract with type validation (using our earlier function)
user_info = extract_user_with_type_validation(user)
if not user_info:
print(f"User {i}: ERROR: Could not extract data")
failed += 1
continue
# Display extracted info
print(f"User {i}: {user_info['full_name']}")
print(f" Email: {user_info['email']}")
if user_info['age'] is not None:
print(f" Age: {user_info['age']}")
print(f" Location: {user_info['location_full']}")
print()
successful += 1
# Summary
print(f"{'='*50}")
print(f"OK: Successfully processed: {successful}")
if failed > 0:
print(f"ERROR: Failed to process: {failed}")
print(f"{'='*50}")
except requests.exceptions.Timeout:
print("ERROR: Request timed out")
except requests.exceptions.RequestException as e:
print(f"ERROR: Network error: {e}")
# Process 10 users
fetch_and_process_users(10)
Run it from the project root. Random User returns different people each time, so your names, countries, and emails will differ; the count and summary shape are what to check:
$ python batch_users.py
=== Processing 10 Users ===
User 1: Emma Johnson
Email: emma.johnson@example.com
Age: 34
Location: Auckland, New Zealand
User 2: Liam Smith
Email: liam.smith@example.com
Age: 28
Location: Toronto, Canada
User 3: Sofia Garcia
Email: sofia.garcia@example.com
Age: 42
Location: Madrid, Spain
User 4: Noah Muller
Email: noah.muller@example.com
Age: 31
Location: Berlin, Germany
User 5: Mia O'Brien
Email: mia.obrien@example.com
Age: 26
Location: Dublin, Ireland
User 6: Lucas Silva
Email: lucas.silva@example.com
Age: 38
Location: Lisbon, Portugal
User 7: Olivia Brown
Email: olivia.brown@example.com
Age: 45
Location: Sydney, Australia
User 8: Ethan Williams
Email: ethan.williams@example.com
Age: 29
Location: Manchester, United Kingdom
User 9: Aria Patel
Email: aria.patel@example.com
Age: 33
Location: Mumbai, India
User 10: Mateo Lopez
Email: mateo.lopez@example.com
Age: 51
Location: Buenos Aires, Argentina
==================================================
OK: Successfully processed: 10
==================================================
Ten users in, ten users out. The interesting runs are the ones where a feed hands you one malformed user among the good ones, or where a whole batch turns out to be junk; Random User is too tidy to do that on demand, but plenty of APIs will. The summary line tells you which one you're looking at, and that's the next sharp edge worth being aware of.
continue lets one bad item skip past without crashing the whole batch, which is what you want. It also means the script exits with status 0 even when every item failed. The canonical failure mode: the API quietly changes a field name overnight, extract_user_with_type_validation returns None for every record, the loop skips each one, and the cron job logs "Successfully processed: 0" into the void until you discover the empty database the next morning. Diagnostic signature: zero rows inserted but no Python exception. Always track the failure count alongside the success count, and treat a high ratio (or a zero success count on a non-empty array) as an alert condition, not a clean run.
In a script you run manually, printing to the terminal is fine. For anything that runs headless (a cron job, a worker, a deployment pipeline), swap print for the logging module so failures land in a file or monitoring system where something can actually alert on them.