Chapter 6: Working with Complex JSON Structures

1. Beyond basic JSON parsing

In this chapter you'll turn brittle JSON-extraction code into defensive code: the kind that survives missing fields, type surprises, and the five-deep nesting real APIs ship by default. By the end you'll have a .get()-with-defaults toolkit, type-safe conversion helpers, and an extraction function that returns a consistent shape no matter how messy the input is.

Chapters 3 and 4 taught you to make requests safely. The responses you worked with, though, were tutorial-clean: every field present, every type as documented. Real APIs aren't like that. Fields go missing, types vary, nesting runs five or six levels deep, and optional fields appear only sometimes.

You've already handled three of Chapter 4's validation layers: the network delivered a response, the status code said success, and the content type confirmed JSON. The fourth layer, the one we tackle here, is the one everyone forgets: the shape of the data itself. Does the key you're reaching for actually exist? Is the value the type you expected? Does the array have any items before you index into it? Every time you write data["results"][0]["name"]["first"], you're making four assumptions. In production, any of them can fail.

What you'll learn
  • Apply .get() with meaningful defaults to every JSON key access
  • Navigate deeply nested structures by chaining empty-dict fallbacks
  • Validate types with isinstance() before type-specific operations
  • Convert safely between strings, ints, and floats with range checks
  • Process arrays defensively with length checks and per-item error tolerance
  • Design extraction functions that return consistent shapes regardless of input quality
  • Inspect API responses systematically when debugging parsing problems
What you'll build
  • type_safe_user.py — the canonical extraction function: .get() defaults, isinstance() validation, range checks, guaranteed return shape
  • batch_users.py — processes ten users defensively, tracking success and failure counts
  • debug_toolkit.py — inspection helper that prints the structure of an unknown response
  • debug_workflow.py — path walker that tells you exactly where an extraction broke and what was available there

You'll also write smaller scratch scripts along the way (naive_user.py, defensive_user.py, extract_user.py, plus a few demos) that exist for the lesson and don't survive into the final toolkit. The four files above are the ones to keep.

What makes JSON complex

Difficulty doesn't come from size. A 50kB response with a flat shape is easier to handle than a 2kB response with five levels of optional nesting. The hard parts are inconsistency and structure variability, and they show up in five predictable ways:

  • Deep nesting. Real APIs nest data four to six levels deep. Accessing data["user"]["profile"]["contact"]["email"]["primary"] requires navigating five levels, and each level could be missing or null.
  • Optional fields. Fields in the documentation don't always appear in responses. The phone field exists for some users but not others. Code that assumes it exists crashes.
  • Variable types. A field documented as an integer sometimes arrives as a string. Age might be 25, "25", or null. Your code needs to handle all three.
  • Empty arrays. An API might return an empty list when no results match, or null, or omit the field entirely. Accessing [0] without checking crashes.
  • Schema variations. Optional fields appear in some responses and not others, and a single endpoint may return different shapes for different requests. A dob object that's present for one user can be absent for the next. Handled the same way as a missing key: .get() with a default that has the right shape for the next step.

Every one of these is solved in the same spirit as Chapter 4's validation layers: don't assume, check. The rest of the chapter is that idea applied five different ways, each tied to a concrete failure mode you'll recognise the next time you read a stack trace.

The problem with direct access

Before we fix anything, we need to see it break. The next few sections walk through a single concrete target, an endpoint, a naive extraction, and the specific scenarios that crash it, so every defensive technique you learn afterwards has a failure it's responding to.

The target: Random User API

We need a target API with enough nesting and variability to make defensive parsing feel real. The Random User Generator at https://randomuser.me/api/ fits well: it returns realistic user profiles, nests data four levels deep, and varies between requests.

Save this at the project root as explore_response.py so you can see the shape we'll be working with:

explore_response.py
import requests
import json

url = "https://randomuser.me/api/"
response = requests.get(url, timeout=10)
data = response.json()

# Pretty-print the full response structure
print(json.dumps(data, indent=2))
Response shape
{
  "results": [
    {
      "name": {
        "first": "Emma",
        "last": "Johnson"
      },
      "location": {
        "street": {
          "number": 1234,
          "name": "Queen St"
        },
        "city": "Auckland",
        "country": "New Zealand"
      },
      "email": "emma.johnson@example.com",
      "dob": {
        "age": 34
      }
    }
  ]
}

The shape above is simplified. The actual response carries more fields (gender, login, registered, phone, cell, picture, nationality, and a richer location with timezone and coordinates); we've shown the four we'll be extracting. Run explore_response.py yourself to see the full structure.

To reach the city name, you walk four levels: results (array) -> [0] (object) -> location (object) -> city (string). Every one of those steps is a potential crash site.

When simple parsing breaks

Here's the extraction the way most tutorials write it. Save it as naive_user.py at the project root:

naive_user.py
import requests

# Fetch user data
response = requests.get("https://randomuser.me/api/", timeout=10)
data = response.json()

# Direct access - looks simple and clean
user = data["results"][0]
first_name = user["name"]["first"]
last_name = user["name"]["last"]
email = user["email"]
age = user["dob"]["age"]
city = user["location"]["city"]
country = user["location"]["country"]

# Use the data
print(f"{first_name} {last_name} ({age})")
print(f"{email}")
print(f"{city}, {country}")

Run it a few times against the live API and, most runs, you'll see output like this:

Terminal
$ python naive_user.py
Emma Johnson (34)
emma.johnson@example.com
Auckland, New Zealand

Looks clean. So where's the problem? The problem is that this code works when every field is present, every type is as expected, and every nested object is a real dict. The moment one of those assumptions fails, the program crashes instead of degrading.

Six ways this code fails

Random User itself is tidy enough that you won't trigger most of these on demand, but across real APIs every one of them happens routinely, and defensive code has to assume any feed can do them. Note the crash type, not just the scenario -- the type tells you what the defensive technique has to prevent:

  • Empty results array. An API returns {"results": []} when no data matches. data["results"][0] raises IndexError: list index out of range. The user sees a stack trace instead of "no results".
  • Missing name key. Imagine a profile that arrives without one. user["name"] raises KeyError: 'name'. The user sees a cryptic error instead of "name unavailable".
  • Null age value. A date of birth was never collected, so the payload carries "age": null. The code doesn't crash, but print(f"({age})") displays (None), which is worse because it ships to users.
  • Type variation. Age arrives as the string "34" instead of the integer 34. Downstream, age + 1 raises TypeError: can only concatenate str (not "int") to str.
  • Nested null. A record ships "location": null instead of a nested object. user["location"]["city"] raises TypeError: 'NoneType' object is not subscriptable, which looks alarming but just means the step before returned None.
  • Schema variation. A dob object that's present for most records is absent for one. user["dob"]["age"] raises KeyError: 'dob' the moment you hit the user without it.

The fix for every single one of these is the same in spirit, validate before you use, but the specific technique differs depending on whether the issue is a missing key, a wrong type, a null value, or an empty array. The next section starts with the most common one: missing keys, solved with .get().