Chapter 10: Advanced JSON Processing

1. From clean JSON to production reality

In this chapter you'll build the toolkit that copes when JSON shows up nested seven levels deep, named under three different fields, and missing half the optional sections (the kind of variation real APIs ship by default). By the end you'll have a diagnostic that maps any response shape, a container normaliser, a dot-path accessor that refuses to crash on missing keys, and a field-policy system that decides per-field whether missing data is fatal or fixable.

Chapter 6 taught JSON fundamentals against the Random User API: one consistent shape, one endpoint, predictable fields. Chapter 9 taught you to keep a request alive when the network misbehaves. This chapter is what happens once the request succeeds but the JSON you get back doesn't match what you expected.

Here is a realistic Orders API response. See if you can find where city lives.

orders_nested.json

{
  "order": {
    "id": "ord_9f2c",
    "customer": {
      "profile": {
        "name": {
          "first": "Ava",
          "last": "Murphy"
        },
        "contact": {
          "email": "ava@example.com",
          "address": {
            "shipping": {
              "line1": "12 Harbour St",
              "city": "Dublin",
              "country": "IE"
            }
          }
        }
      }
    },
    "items": [
      {
        "sku": "SKU-001",
        "product": {
          "details": {
            "title": "Stainless Water Bottle",
            "pricing": {
              "currency": "EUR",
              "amount": 24.99
            }
          }
        }
      }
    ],
    "payment": {
      "provider": "stripe",
      "transaction": {
        "status": "succeeded",
        "risk": {
          "score": 12,
          "flags": ["ip_mismatch"]
        }
      }
    }
  }
}

The full path is order.customer.profile.contact.address.shipping.city. Seven levels deep. Every dot is one more step into the object; every step is one more place the data could be missing, null, or a different shape than you expected. The .get() habits from Chapter 6 stop scaling once nesting goes past two or three levels, and the patterns in this chapter are how professional code reaches that data without crashing.

Four ways production JSON breaks naive code

Fields nest several levels deep

The orders response above is typical, not a worst case. The moment you write order["customer"]["profile"]["contact"]["address"]["shipping"]["city"], a single missing intermediate dictionary raises KeyError and takes the whole request down. Chapter 6's dictionary navigation handles two or three levels; past that, the brittleness compounds.

Optional sections appear and disappear

Real APIs often omit sections that don't apply. If a customer hasn't saved a shipping address, the address block is missing entirely. The path order.customer.profile.contact.address.shipping.city no longer exists:

order_address_missing.json

{
  "order": {
    "id": "ord_9f2c",
    "customer": {
      "profile": {
        "name": { "first": "Ava", "last": "Murphy" },
        "contact": {
          "email": "ava@example.com"
        }
      }
    }
  }
}

Code that assumes address.shipping.city is always present crashes with a KeyError (missing key) or a TypeError (you expected a dictionary and got None). The two failure modes from Chapter 9 show up right here, and the fix isn't categorisation or retry -- it's structural navigation that tolerates missing intermediates.

Arrays multiply the problem

Production APIs often return arrays containing hundreds of objects, and those objects don't always share the same shape. One endpoint wraps records in a results array, another uses items, and a third returns data directly at the root with no wrapper at all. Within those arrays, individual objects may vary: some records include optional fields, others omit them; some fields are strings in one object and numbers in another; some objects nest data deeply while others keep it flat.

The same concept arrives in different shapes

Deep nesting and missing sections happen inside a single endpoint. Shape drift happens between endpoints: the same business object, serialised differently by a modern endpoint and a legacy partner feed. Here are two responses describing the same order by the same customer. Scan both and count the differences you can spot: field names, data types, nesting, wrappers.

order_variant_a.json

{
  "id": "ORD-91352",
  "created_at": "2025-06-18T14:22:31Z",
  "total": "129.50",
  "currency": "EUR",
  "status": "shipped",
  "customer": {
    "id": 7712,
    "email": "alice@example.com"
  },
  "items": [
    { "sku": "P-001", "qty": 2, "price": 39.75 },
    { "sku": "P-009", "qty": 1, "price": 50.00 }
  ],
  "discount": null
}

The business concepts are identical in the legacy feed below, but the field names, types, and structure have all drifted.

order_variant_b.json

{
  "order_id": 91352,
  "ts": 1750256551,
  "amount": 129.5,
  "currency_code": "EUR",
  "state": "Shipped",
  "customer_id": "7712",
  "line_items": [
    { "product": { "sku": "P-001" }, "quantity": "2", "unit_price": "39.75" },
    { "product": { "sku": "P-009" }, "quantity": 1,   "unit_price": 50 }
  ],
  "promo": { "code": "SUMMER", "value": "10%" }
}

Same order, completely different shape. Here's what drifted:

Field naming: id vs order_id, total vs amount, status vs state
Type differences: total as a string vs amount as a number; id as a string vs an integer
Date format: ISO 8601 string ("2025-06-18T14:22:31Z") vs Unix timestamp (1750256551)
Customer data: embedded object with id and email vs a bare customer_id reference string
Items array: flat fields (sku, qty, price) vs a nested product object with different key names
Discount: explicit discount: null vs a promo object with a percentage string

This is the central challenge the chapter addresses. Later on this page you'll see both variants in detail and design a target shape that both must produce. On the normalizer page you'll build the function that makes it happen, using the tools and patterns the intermediate pages teach.

What you'll learn

Recognise when API variation justifies a normalization layer, and when direct .get() access is still the right call
Map an unfamiliar API's shape with a diagnostic helper before writing any extraction code
Write access utilities that work across direct arrays, wrapped arrays, and single-object responses
Navigate deeply nested data with a safe accessor that refuses to crash on missing keys
Classify fields as required, recommended, or optional -- and pick a fail-fast or fail-soft policy for each
Apply the eight transformation patterns that collapse wildly different API shapes into one canonical structure

What you'll build

explore_api_structure.py -- a diagnostic function that prints the shape of any API response
normalize_collection.py -- a container normalizer that returns items from direct arrays, wrapped arrays, and singletons
extract_items_and_meta.py -- pagination-preserving extraction that surfaces cursors and next-page links alongside the items
first_item.py -- a convenience wrapper for single-resource detail endpoints that returns just the first item or None
safe_get.py -- a dot-path accessor (with a try_fields() helper) for deep navigation and alternative field names
field_policies.py -- require() / default() guards plus a MissingRequired exception for failed requires
order_normalizer.py -- the complete normalizer that turns both vendor variants into the same canonical shape, all eight patterns in one file

When you need these techniques

Not every API integration needs this toolkit. A single stable endpoint that returns predictable JSON is perfectly fine with Chapter 6's direct .get() calls. Build the normalization layer when the situation warrants it:

Integrating multiple API variants: different versions, different providers, or different endpoints with incompatible structures
API structure changes frequently: the provider updates field names, nesting, or types regularly
Isolating business logic from API changes: you want your application code stable even when external APIs evolve
Building libraries or SDKs: you're wrapping external APIs for others to use

For a one-time extraction script against a stable API, direct dictionary access is appropriate. Don't build infrastructure you don't need. The techniques in this chapter are tools, not mandates -- professional developers reach for normalization when the situation warrants it, not reflexively. That said, understanding the patterns prepares you for the reality that even "simple" integrations tend to evolve into complex ones, and knowing the approach means you'll recognise the moment to apply it.

Examples assume Python 3.10+ and the requests library from earlier chapters. You should be comfortable with .json(), dictionary navigation with bracket notation and .get(), and looping through arrays -- all from Chapter 6. This chapter extends those basics into professional-grade handling; Chapter 12 then adds schema validation on top.

Canonicalization: one shape to rule them all

The professional answer to shape drift is canonicalization -- the process of transforming varying API formats into a single, standardised internal representation. Instead of teaching every piece of application code to tolerate every variant, you build one transformation layer that accepts any shape and outputs the same canonical structure every time.

That transformation layer is called a normalizer. It sits between the raw API response and your application logic. No matter which variant arrives, the normalizer produces identical output. Your application never sees the chaos.

Building a normalizer isn't about memorising dozens of special cases. It's about understanding eight core transformation patterns that handle the structural variations you'll encounter. Learn them once, apply them to any API. The rest of this page sets up the target shape and the eight patterns; the intermediate pages teach the tools that make each pattern practical; the normalizer page assembles the complete function that turns both order variants into identical canonical output.

The target shape

Before you can build a normalizer, you need to know what it's producing. A canonical shape is a deliberate design decision -- you pick one name per concept, one type per field, one nesting pattern per structure, and every inbound variant transforms to match.

For the vendor orders, here's the shape we'll target -- a synthetic example combining Variant A's identity fields with Variant B's discount, so every canonical field is populated in one place. One canonical name per field (id, not order_id). Numbers as numbers, not strings. Timestamps as ISO 8601. The customer embedded as an object, even when the source feed only had an ID:

order_canonical.json

{
  "id": "ORD-91352",
  "created_at": "2025-06-18T14:22:31Z",
  "total": 129.50,
  "currency": "EUR",
  "customer": { "id": 7712, "email": "alice@example.com" },
  "items": [
    {"sku": "P-001", "qty": 2, "price": 39.75},
    {"sku": "P-009", "qty": 1, "price": 50.00}
  ],
  "discount": {"type": "percent", "value": 10},
  "status": "shipped"
}

Four rules of thumb for designing a canonical shape:

One name per concept: pick a single canonical field name (id over order_id, items over line_items)
Standardise types: decide on one type per field (numbers as numbers, not strings; timestamps as ISO 8601)
Consistent nesting: choose one structure (customer as embedded object, items as array, discount as structured object)
Most expressive wins: when variants conflict, prefer the representation that's clearest for downstream code

The eight transformation patterns

Reaching that canonical shape from two different variants takes systematic transformation. Professional normalizers combine eight patterns that work together to turn messy input into clean, predictable output:

Pattern	What it does	Example
1. Field mapping	Rename fields to canonical names	`order_id` → `id`
2. Type coercion	Convert strings to expected types	`"129.50"` → `129.5`
3. Time unification	Normalize timestamp formats	`1750256551` → ISO 8601
4. Enum harmonization	Standardize enumerated values	`"Shipped"` → `"shipped"`
5. Join strategy	Handle embedded vs referenced data	`customer_id` → customer object
6. Pagination adapter	Unify pagination approaches	`meta.cursor` / `nextPage` → `next_token`
7. Optional field handling	Provide safe defaults for missing data	missing `discount` → `null`
8. Array processing	Normalize nested collections	`line_items` → canonical `items`

Not every API needs all eight. A simple integration might only require field mapping and type coercion. Understanding the complete toolkit prepares you for any integration challenge, and recognising which patterns apply to a specific API is how you assess new integrations quickly.

Most transformations fall into three categories: rename (order_id → id), convert (Unix → ISO), or restructure (flatten or nest). Once you recognise these categories, any API becomes tractable.

Complete field mapping for the vendor orders

Here's every field transformation needed to turn both variants into the canonical shape. The left column shows what arrives from each variant, the middle shows the canonical field name, and the right names the transformation:

Incoming field	Canonical	Transformation
id / order_id	id	Stringify numeric IDs
created_at (ISO) / ts (unix)	created_at	Convert unix → ISO 8601
total / amount	total	Coerce to number
currency / currency_code	currency	Uppercase 3-letter
customer / customer_id	customer	Embed object or keep {id}
items / line_items	items	Normalize to {sku, qty, price}
discount / promo	discount	Normalize percent/value
status / state	status	Lowercase + enum validate
meta.cursor / nextPage	next_token	Cursor or link adapter

You'll build the normalizer that implements every row of this table on the normalizer page, after the intermediate pages teach the diagnostic and extraction tools that make it practical. The preview below shows what a normalizer fragment looks like in code -- just two of the eight patterns, to anchor the shape before we build it properly:

normalize_preview.py

def normalize_order_preview(order):
    """
    Preview: the core concept with just two of the eight patterns.
    The complete normalizer on the normalizer page handles all eight.
    """
    # 1. FIELD MAPPING: try both possible field names
    order_id = order.get("id") or order.get("order_id")
    total_amount = order.get("total") or order.get("amount")

    # 2. TYPE COERCION: ensure consistent types
    order_id = str(order_id)  # always string
    total_amount = float(total_amount or 0)  # always number

    return {
        "id": order_id,
        "total": total_amount,
    }

This fragment demonstrates the two most fundamental patterns: field mapping (id or order_id → id) and type coercion (string or number → float). The preview uses or short-circuiting for brevity; the production version on the normalizer page reaches for the try_fields() helper instead, which falls through only on missing or empty-string values rather than on every falsy value (an integer 0 would skip or, but try_fields keeps it). You'll build out the remaining six patterns step by step on the normalizer page, not as a single dense function but as a composition of small helpers that each handle one transformation.

The immediate next step is the diagnostic tool that reveals an unknown API's shape before you write a single extraction line. If you can't see the structure, you can't normalize it.