4. Working with arrays defensively

Up to now you've been extracting data from single objects: one user, one profile, one row. Most real APIs don't stop there. GitHub returns arrays of repositories, Spotify returns arrays of tracks, search endpoints of every kind return arrays of matches. Arrays add four new ways to crash: the array can be empty, it can be null instead of a list, the field can be missing entirely, or it can come back as the wrong type (a string that [0] happily indexes one character into). This section teaches the pattern that handles all four.

The shape difference looks like this, a single-object response versus an array-of-objects response. Most of the chapter so far has been about the first shape; everything below is about the second:

Response shapes

// Single object - what you've been working with:
{
  "user": {
    "name": "Alice",
    "email": "alice@example.com"
  }
}

// Array of objects - what most search/list APIs return:
{
  "results": [
    {
      "name": "Alice",
      "email": "alice@example.com"
    },
    {
      "name": "Bob",
      "email": "bob@example.com"
    },
    {
      "name": "Charlie",
      "email": "charlie@example.com"
    }
  ]
}

Any time you see /search or /list or "get multiple X" in an API's docs, the response will be an array. And arrays break in their own specific ways: length varies (zero, one, thousands), an empty result is a valid non-error response, each item inside the list can have different fields present, and the naive results[0] access raises IndexError the moment the list is empty.

The four ways an array access crashes

The most common array-related crash is accessing an index that doesn't exist, but there are subtler failure modes too. Save this as array_crashes.py:

array_crashes.py

# Scenario 1: Empty array
response_data_1 = {"results": []}
try:
    first_user = response_data_1["results"][0]  # IndexError!
    print(first_user)
except IndexError as e:
    print(f"ERROR: Empty array: {e}")

# Scenario 2: Null instead of array
response_data_2 = {"results": None}
try:
    first_user = response_data_2["results"][0]  # TypeError!
    print(first_user)
except TypeError as e:
    print(f"ERROR: Null array: {e}")

# Scenario 3: Missing key
response_data_3 = {}
try:
    first_user = response_data_3["results"][0]  # KeyError then won't even get to IndexError
    print(first_user)
except KeyError as e:
    print(f"ERROR: Missing key: {e}")

# Scenario 4: Wrong type (string instead of array)
response_data_4 = {"results": "no results found"}
try:
    first_user = response_data_4["results"][0]  # Returns 'n' (first char)!
    print(f"Got: {first_user}")  # Silently wrong!
except Exception as e:
    print(f"ERROR: Error: {e}")

Run it from the project root:

Terminal

$ python array_crashes.py
ERROR: Empty array: list index out of range
ERROR: Null array: 'NoneType' object is not subscriptable
ERROR: Missing key: 'results'
Got: n

The first three fail loudly. The fourth is the dangerous one: when results is a string instead of a list, [0] returns the first character without raising anything. "n" is now silently flowing downstream as if it were a user object, and every .get() call on it will explode later in a line of code that doesn't look obviously related. This is why the defensive pattern needs an explicit isinstance(..., list) check, not just a length check.

Safe array access

Before accessing any index, three checks need to pass: (1) the field exists, (2) it's a list, (3) it has enough items. Skip any one of those and you'll hit one of the four crashes above. Here's the pattern wrapped in a function that returns a (success, result) tuple so the caller can react to failure without a try/except.

Save this as safe_arrays.py:

safe_arrays.py

def get_first_user_safely(data):
    """Extract first user with complete validation."""
    
    # Step 1: Get the results field with default
    results = data.get("results")
    
    # Step 2: Validate it's actually a list
    if not isinstance(results, list):
        return (False, "Results is not a list")
    
    # Step 3: Check list is not empty
    if len(results) == 0:
        return (False, "No results found")
    
    # Step 4: Safe to access first item
    first_user = results[0]
    
    # Step 5: Validate the item is a dict (expected structure)
    if not isinstance(first_user, dict):
        return (False, "First result is not a dictionary")
    
    return (True, first_user)

# Test with various response formats
test_responses = [
    {"results": [{"name": "Alice"}, {"name": "Bob"}]},  # Normal
    {"results": []},  # Empty array
    {"results": None},  # Null
    {},  # Missing key
    {"results": "no data"},  # Wrong type
    {"results": [None, {"name": "Bob"}]},  # First item is null
]

print("=== Safe Array Access Tests ===\n")
for i, test_data in enumerate(test_responses, 1):
    success, result = get_first_user_safely(test_data)
    if success:
        print(f"Test {i}: OK: Got user: {result}")
    else:
        print(f"Test {i}: ERROR: {result}")

Run it from the project root:

Terminal

$ python safe_arrays.py
=== Safe Array Access Tests ===

Test 1: OK: Got user: {'name': 'Alice'}
Test 2: ERROR: No results found
Test 3: ERROR: Results is not a list
Test 4: ERROR: Results is not a list
Test 5: ERROR: Results is not a list
Test 6: ERROR: First result is not a dictionary

Same five inputs as the crash demo, plus a well-formed one and one where the first item is None, and all six get a predictable, readable result. Returning (False, reason) instead of raising lets the caller choose, show a user-friendly message, retry with different parameters, or log and move on. Any time you find yourself writing try: users[0], reach for this shape instead.

Processing a whole array

One item is easy. A list of thirty users with one malformed record in the middle is where defensive loops earn their keep. The pattern is the same three-layer idea applied twice: validate the array once before the loop, and validate each item once inside. Crucially, catch per-item failures with continue rather than raising, so one bad record doesn't crash the whole batch, and keep a running count of successes and failures so you can tell whether the response was healthy or garbage.

Save this as batch_users.py. It imports the extract_user_with_type_validation function from type_safe_user.py, so keep that file in the same folder:

batch_users.py

import requests

from type_safe_user import extract_user_with_type_validation


def fetch_and_process_users(count=10):
    """Fetch multiple users with defensive processing."""
    
    try:

        # Fetch data with timeout
        response = requests.get(
            f"https://randomuser.me/api/?results={count}",
            timeout=10
        )
        response.raise_for_status()
        
        # Validate content type
        content_type = response.headers.get("Content-Type", "")
        if "application/json" not in content_type:
            print(f"ERROR: Expected JSON but received {content_type}")
            return
        
        # Parse JSON
        try:
            data = response.json()
        except ValueError:
            print("ERROR: Invalid JSON in response")
            return
        
        # Extract results array safely
        results = data.get("results")
        
        if not isinstance(results, list):
            print("ERROR: Results is not a list")
            return
        
        if len(results) == 0:
            print("INFO: No users found")
            return
        
        print(f"=== Processing {len(results)} Users ===\n")
        
        # Process each user with defensive extraction
        successful = 0
        failed = 0
        
        for i, user in enumerate(results, 1):

            # Validate each item is a dictionary
            if not isinstance(user, dict):
                print(f"User {i}: ERROR: Invalid data structure")
                failed += 1
                continue
            
            # Extract with type validation (using our earlier function)
            user_info = extract_user_with_type_validation(user)
            
            if not user_info:
                print(f"User {i}: ERROR: Could not extract data")
                failed += 1
                continue
            
            # Display extracted info
            print(f"User {i}: {user_info['full_name']}")
            print(f"  Email: {user_info['email']}")
            
            if user_info['age'] is not None:
                print(f"  Age: {user_info['age']}")
            
            print(f"  Location: {user_info['location_full']}")
            print()
            
            successful += 1
        
        # Summary
        print(f"{'='*50}")
        print(f"OK: Successfully processed: {successful}")
        if failed > 0:
            print(f"ERROR: Failed to process: {failed}")
        print(f"{'='*50}")
    
    except requests.exceptions.Timeout:
        print("ERROR: Request timed out")
    except requests.exceptions.RequestException as e:
        print(f"ERROR: Network error: {e}")

# Process 10 users
fetch_and_process_users(10)

Run it from the project root. Random User returns different people each time, so your names, countries, and emails will differ; the count and summary shape are what to check:

Terminal

$ python batch_users.py
=== Processing 10 Users ===

User 1: Emma Johnson
  Email: emma.johnson@example.com
  Age: 34
  Location: Auckland, New Zealand

User 2: Liam Smith
  Email: liam.smith@example.com
  Age: 28
  Location: Toronto, Canada

User 3: Sofia Garcia
  Email: sofia.garcia@example.com
  Age: 42
  Location: Madrid, Spain

User 4: Noah Muller
  Email: noah.muller@example.com
  Age: 31
  Location: Berlin, Germany

User 5: Mia O'Brien
  Email: mia.obrien@example.com
  Age: 26
  Location: Dublin, Ireland

User 6: Lucas Silva
  Email: lucas.silva@example.com
  Age: 38
  Location: Lisbon, Portugal

User 7: Olivia Brown
  Email: olivia.brown@example.com
  Age: 45
  Location: Sydney, Australia

User 8: Ethan Williams
  Email: ethan.williams@example.com
  Age: 29
  Location: Manchester, United Kingdom

User 9: Aria Patel
  Email: aria.patel@example.com
  Age: 33
  Location: Mumbai, India

User 10: Mateo Lopez
  Email: mateo.lopez@example.com
  Age: 51
  Location: Buenos Aires, Argentina

==================================================
OK: Successfully processed: 10
==================================================

Ten users in, ten users out. The interesting runs are the ones where a feed hands you one malformed user among the good ones, or where a whole batch turns out to be junk; Random User is too tidy to do that on demand, but plenty of APIs will. The summary line tells you which one you're looking at, and that's the next sharp edge worth being aware of.

Watch your success rate, not just your exit code

continue lets one bad item skip past without crashing the whole batch, which is what you want. It also means the script exits with status 0 even when every item failed. The canonical failure mode: the API quietly changes a field name overnight, extract_user_with_type_validation returns None for every record, the loop skips each one, and the cron job logs "Successfully processed: 0" into the void until you discover the empty database the next morning. Diagnostic signature: zero rows inserted but no Python exception. Always track the failure count alongside the success count, and treat a high ratio (or a zero success count on a non-empty array) as an alert condition, not a clean run.

In a script you run manually, printing to the terminal is fine. For anything that runs headless (a cron job, a worker, a deployment pipeline), swap print for the logging module so failures land in a file or monitoring system where something can actually alert on them.