Chapter 10: Advanced JSON Processing
1. From clean JSON to production reality
In this chapter you'll build the toolkit that copes when JSON shows up nested seven levels deep, named under three different fields, and missing half the optional sections (the kind of variation real APIs ship by default). By the end you'll have a diagnostic that maps any response shape, a container normaliser, a dot-path accessor that refuses to crash on missing keys, and a field-policy system that decides per-field whether missing data is fatal or fixable.
Chapter 6 taught JSON fundamentals against the Random User API: one consistent shape, one endpoint, predictable fields. Chapter 9 taught you to keep a request alive when the network misbehaves. This chapter is what happens once the request succeeds but the JSON you get back doesn't match what you expected.
Here is a realistic Orders API response. See if you can find where city lives.
{
"order": {
"id": "ord_9f2c",
"customer": {
"profile": {
"name": {
"first": "Ava",
"last": "Murphy"
},
"contact": {
"email": "ava@example.com",
"address": {
"shipping": {
"line1": "12 Harbour St",
"city": "Dublin",
"country": "IE"
}
}
}
}
},
"items": [
{
"sku": "SKU-001",
"product": {
"details": {
"title": "Stainless Water Bottle",
"pricing": {
"currency": "EUR",
"amount": 24.99
}
}
}
}
],
"payment": {
"provider": "stripe",
"transaction": {
"status": "succeeded",
"risk": {
"score": 12,
"flags": ["ip_mismatch"]
}
}
}
}
}
The full path is order.customer.profile.contact.address.shipping.city. Seven levels deep. Every dot is one more step into the object; every step is one more place the data could be missing, null, or a different shape than you expected. The .get() habits from Chapter 6 stop scaling once nesting goes past two or three levels, and the patterns in this chapter are how professional code reaches that data without crashing.
Four ways production JSON breaks naive code
Fields nest several levels deep
The orders response above is typical, not a worst case. The moment you write order["customer"]["profile"]["contact"]["address"]["shipping"]["city"], a single missing intermediate dictionary raises KeyError and takes the whole request down. Chapter 6's dictionary navigation handles two or three levels; past that, the brittleness compounds.
Optional sections appear and disappear
Real APIs often omit sections that don't apply. If a customer hasn't saved a shipping address, the address block is missing entirely. The path order.customer.profile.contact.address.shipping.city no longer exists:
{
"order": {
"id": "ord_9f2c",
"customer": {
"profile": {
"name": { "first": "Ava", "last": "Murphy" },
"contact": {
"email": "ava@example.com"
}
}
}
}
}
Code that assumes address.shipping.city is always present crashes with a KeyError (missing key) or a TypeError (you expected a dictionary and got None). The two failure modes from Chapter 9 show up right here, and the fix isn't categorisation or retry -- it's structural navigation that tolerates missing intermediates.
Arrays multiply the problem
Production APIs often return arrays containing hundreds of objects, and those objects don't always share the same shape. One endpoint wraps records in a results array, another uses items, and a third returns data directly at the root with no wrapper at all. Within those arrays, individual objects may vary: some records include optional fields, others omit them; some fields are strings in one object and numbers in another; some objects nest data deeply while others keep it flat.
The same concept arrives in different shapes
Deep nesting and missing sections happen inside a single endpoint. Shape drift happens between endpoints: the same business object, serialised differently by a modern endpoint and a legacy partner feed. Here are two responses describing the same order by the same customer. Scan both and count the differences you can spot: field names, data types, nesting, wrappers.
{
"id": "ORD-91352",
"created_at": "2025-06-18T14:22:31Z",
"total": "129.50",
"currency": "EUR",
"status": "shipped",
"customer": {
"id": 7712,
"email": "alice@example.com"
},
"items": [
{ "sku": "P-001", "qty": 2, "price": 39.75 },
{ "sku": "P-009", "qty": 1, "price": 50.00 }
],
"discount": null
}
The business concepts are identical in the legacy feed below, but the field names, types, and structure have all drifted.
{
"order_id": 91352,
"ts": 1750256551,
"amount": 129.5,
"currency_code": "EUR",
"state": "Shipped",
"customer_id": "7712",
"line_items": [
{ "product": { "sku": "P-001" }, "quantity": "2", "unit_price": "39.75" },
{ "product": { "sku": "P-009" }, "quantity": 1, "unit_price": 50 }
],
"promo": { "code": "SUMMER", "value": "10%" }
}
Same order, completely different shape. Here's what drifted:
- Field naming:
idvsorder_id,totalvsamount,statusvsstate - Type differences:
totalas a string vsamountas a number;idas a string vs an integer - Date format: ISO 8601 string (
"2025-06-18T14:22:31Z") vs Unix timestamp (1750256551) - Customer data: embedded object with
idandemailvs a barecustomer_idreference string - Items array: flat fields (
sku,qty,price) vs a nested product object with different key names - Discount: explicit
discount: nullvs apromoobject with a percentage string
This is the central challenge the chapter addresses. Later on this page you'll see both variants in detail and design a target shape that both must produce. On the normalizer page you'll build the function that makes it happen, using the tools and patterns the intermediate pages teach.
What you'll learn
- Recognise when API variation justifies a normalization layer, and when direct
.get()access is still the right call - Map an unfamiliar API's shape with a diagnostic helper before writing any extraction code
- Write access utilities that work across direct arrays, wrapped arrays, and single-object responses
- Navigate deeply nested data with a safe accessor that refuses to crash on missing keys
- Classify fields as required, recommended, or optional -- and pick a fail-fast or fail-soft policy for each
- Apply the eight transformation patterns that collapse wildly different API shapes into one canonical structure
What you'll build
explore_api_structure.py-- a diagnostic function that prints the shape of any API responsenormalize_collection.py-- a container normalizer that returns items from direct arrays, wrapped arrays, and singletonsextract_items_and_meta.py-- pagination-preserving extraction that surfaces cursors and next-page links alongside the itemsfirst_item.py-- a convenience wrapper for single-resource detail endpoints that returns just the first item orNonesafe_get.py-- a dot-path accessor (with atry_fields()helper) for deep navigation and alternative field namesfield_policies.py--require()/default()guards plus aMissingRequiredexception for failed requiresorder_normalizer.py-- the complete normalizer that turns both vendor variants into the same canonical shape, all eight patterns in one file
When you need these techniques
Not every API integration needs this toolkit. A single stable endpoint that returns predictable JSON is perfectly fine with Chapter 6's direct .get() calls. Build the normalization layer when the situation warrants it:
- Integrating multiple API variants: different versions, different providers, or different endpoints with incompatible structures
- API structure changes frequently: the provider updates field names, nesting, or types regularly
- Isolating business logic from API changes: you want your application code stable even when external APIs evolve
- Building libraries or SDKs: you're wrapping external APIs for others to use
For a one-time extraction script against a stable API, direct dictionary access is appropriate. Don't build infrastructure you don't need. The techniques in this chapter are tools, not mandates -- professional developers reach for normalization when the situation warrants it, not reflexively. That said, understanding the patterns prepares you for the reality that even "simple" integrations tend to evolve into complex ones, and knowing the approach means you'll recognise the moment to apply it.
Examples assume Python 3.10+ and the requests library from earlier chapters. You should be comfortable with .json(), dictionary navigation with bracket notation and .get(), and looping through arrays -- all from Chapter 6. This chapter extends those basics into professional-grade handling; Chapter 12 then adds schema validation on top.
Canonicalization: one shape to rule them all
The professional answer to shape drift is canonicalization -- the process of transforming varying API formats into a single, standardised internal representation. Instead of teaching every piece of application code to tolerate every variant, you build one transformation layer that accepts any shape and outputs the same canonical structure every time.
That transformation layer is called a normalizer. It sits between the raw API response and your application logic. No matter which variant arrives, the normalizer produces identical output. Your application never sees the chaos.
Building a normalizer isn't about memorising dozens of special cases. It's about understanding eight core transformation patterns that handle the structural variations you'll encounter. Learn them once, apply them to any API. The rest of this page sets up the target shape and the eight patterns; the intermediate pages teach the tools that make each pattern practical; the normalizer page assembles the complete function that turns both order variants into identical canonical output.
The target shape
Before you can build a normalizer, you need to know what it's producing. A canonical shape is a deliberate design decision -- you pick one name per concept, one type per field, one nesting pattern per structure, and every inbound variant transforms to match.
For the vendor orders, here's the shape we'll target -- a synthetic example combining Variant A's identity fields with Variant B's discount, so every canonical field is populated in one place. One canonical name per field (id, not order_id). Numbers as numbers, not strings. Timestamps as ISO 8601. The customer embedded as an object, even when the source feed only had an ID:
{
"id": "ORD-91352",
"created_at": "2025-06-18T14:22:31Z",
"total": 129.50,
"currency": "EUR",
"customer": { "id": 7712, "email": "alice@example.com" },
"items": [
{"sku": "P-001", "qty": 2, "price": 39.75},
{"sku": "P-009", "qty": 1, "price": 50.00}
],
"discount": {"type": "percent", "value": 10},
"status": "shipped"
}
Four rules of thumb for designing a canonical shape:
- One name per concept: pick a single canonical field name (
idoverorder_id,itemsoverline_items) - Standardise types: decide on one type per field (numbers as numbers, not strings; timestamps as ISO 8601)
- Consistent nesting: choose one structure (customer as embedded object, items as array, discount as structured object)
- Most expressive wins: when variants conflict, prefer the representation that's clearest for downstream code
The eight transformation patterns
Reaching that canonical shape from two different variants takes systematic transformation. Professional normalizers combine eight patterns that work together to turn messy input into clean, predictable output:
| Pattern | What it does | Example |
|---|---|---|
| 1. Field mapping | Rename fields to canonical names | order_id → id |
| 2. Type coercion | Convert strings to expected types | "129.50" → 129.5 |
| 3. Time unification | Normalize timestamp formats | 1750256551 → ISO 8601 |
| 4. Enum harmonization | Standardize enumerated values | "Shipped" → "shipped" |
| 5. Join strategy | Handle embedded vs referenced data | customer_id → customer object |
| 6. Pagination adapter | Unify pagination approaches | meta.cursor / nextPage → next_token |
| 7. Optional field handling | Provide safe defaults for missing data | missing discount → null |
| 8. Array processing | Normalize nested collections | line_items → canonical items |
Not every API needs all eight. A simple integration might only require field mapping and type coercion. Understanding the complete toolkit prepares you for any integration challenge, and recognising which patterns apply to a specific API is how you assess new integrations quickly.
Most transformations fall into three categories: rename (order_id → id), convert (Unix → ISO), or restructure (flatten or nest). Once you recognise these categories, any API becomes tractable.
Complete field mapping for the vendor orders
Here's every field transformation needed to turn both variants into the canonical shape. The left column shows what arrives from each variant, the middle shows the canonical field name, and the right names the transformation:
| Incoming field | Canonical | Transformation |
|---|---|---|
| id / order_id | id | Stringify numeric IDs |
| created_at (ISO) / ts (unix) | created_at | Convert unix → ISO 8601 |
| total / amount | total | Coerce to number |
| currency / currency_code | currency | Uppercase 3-letter |
| customer / customer_id | customer | Embed object or keep {id} |
| items / line_items | items | Normalize to {sku, qty, price} |
| discount / promo | discount | Normalize percent/value |
| status / state | status | Lowercase + enum validate |
| meta.cursor / nextPage | next_token | Cursor or link adapter |
You'll build the normalizer that implements every row of this table on the normalizer page, after the intermediate pages teach the diagnostic and extraction tools that make it practical. The preview below shows what a normalizer fragment looks like in code -- just two of the eight patterns, to anchor the shape before we build it properly:
def normalize_order_preview(order):
"""
Preview: the core concept with just two of the eight patterns.
The complete normalizer on the normalizer page handles all eight.
"""
# 1. FIELD MAPPING: try both possible field names
order_id = order.get("id") or order.get("order_id")
total_amount = order.get("total") or order.get("amount")
# 2. TYPE COERCION: ensure consistent types
order_id = str(order_id) # always string
total_amount = float(total_amount or 0) # always number
return {
"id": order_id,
"total": total_amount,
}
This fragment demonstrates the two most fundamental patterns: field mapping (id or order_id → id) and type coercion (string or number → float). The preview uses or short-circuiting for brevity; the production version on the normalizer page reaches for the try_fields() helper instead, which falls through only on missing or empty-string values rather than on every falsy value (an integer 0 would skip or, but try_fields keeps it). You'll build out the remaining six patterns step by step on the normalizer page, not as a single dense function but as a composition of small helpers that each handle one transformation.
The immediate next step is the diagnostic tool that reveals an unknown API's shape before you write a single extraction line. If you can't see the structure, you can't normalize it.