6. Building the complete normalizer

Every utility the previous pages built points at one goal: the vendor orders normalizer promised on the overview page, two completely different API variants reduced to one canonical output. This page builds it in four passes, two patterns at a time, with both variants fed through each pass so the canonical structure comes into focus as the file grows.

Building the normalizer in passes

You will build order_normalizer.py in four passes: four runnable versions of the file, each extending the last. Pass 1 handles two patterns and emits a three-field canonical dict. Each subsequent pass adds two more patterns until all eight are in place. After every pass you can save the file, run a short verify script, and watch the two variants drift closer to identical output.

Presenting the finished normalizer all at once would hide the reasoning. Adding one or two patterns per pass makes each decision visible: what problem does this pattern solve, why this implementation, and what happens to the two variants when you compare their output side by side.

Pass 1: rename fields and coerce types

The first pass solves the two most fundamental shape problems: vendors use different field names for the same concept (id vs order_id, total vs amount, currency vs currency_code), and they store the same value with different types ("129.50" string vs 129.5 number). Two patterns -- field mapping and type coercion -- handle a surprising amount of variation on their own; everything later in the file builds on this foundation. Save this at the project root as order_normalizer.py:

order_normalizer.py (pass 1)

from typing import Any, Dict
from safe_get import try_fields


def normalize_order(raw: Dict[str, Any]) -> Dict[str, Any]:
    """
    Pass 1: field mapping + type coercion only.
    Handles id/order_id, total/amount, currency naming.
    """
    # PATTERN 1: FIELD MAPPING
    # Try both possible field names, normalize to 'id'
    order_id_raw = try_fields(raw, ["id", "order_id"])
    order_id = str(order_id_raw)

    # Ensure ID has proper prefix
    if order_id and not order_id.startswith("ORD-"):
        order_id = f"ORD-{order_id}"

    # PATTERN 2: TYPE COERCION
    # Convert string or number to float
    total_raw = try_fields(raw, ["total", "amount"])
    total = float(total_raw) if total_raw is not None else 0.0

    # Map currency field names and normalize to uppercase
    currency = try_fields(raw, ["currency", "currency_code"], "USD")
    currency = currency.upper() if currency else "USD"

    return {
        "id": order_id,
        "total": total,
        "currency": currency,
    }

Save this as verify_pass1.py and run it. Both variants are fed through the same function; the output dicts should be identical for these three fields:

verify_pass1.py

from order_normalizer import normalize_order

variant_a = {"id": "ORD-91352", "total": "129.50", "currency": "EUR"}
variant_b = {"order_id": 91352, "amount": 129.5, "currency_code": "EUR"}

print("Variant A:", normalize_order(variant_a))
print("Variant B:", normalize_order(variant_b))
print("Same keys?", normalize_order(variant_a).keys() == normalize_order(variant_b).keys())
print("Same id?  ", normalize_order(variant_a)["id"] == normalize_order(variant_b)["id"])

Terminal

Variant A: {'id': 'ORD-91352', 'total': 129.5, 'currency': 'EUR'}
Variant B: {'id': 'ORD-91352', 'total': 129.5, 'currency': 'EUR'}
Same keys? True
Same id?   True

If you see both dicts identical and both verification lines print True, pass 1 is working: try_fields() handled the name variations and explicit float() made the total a number in both cases. If the dicts differ on id -- e.g. Variant B prints 'id': '91352' without the ORD- prefix -- the prefix branch isn't firing, usually because the line that overwrites order_id was indented inside the wrong block. If you see an ImportError on try_fields, the helper from the flexible-access page wasn't saved as safe_get.py at the project root.

Pass 2: unify timestamps and harmonize status enums

Pass 2 adds two patterns that protect downstream code from input variation. Variant A uses ISO 8601 strings ("2025-06-18T14:22:31Z"); Variant B uses Unix integers (1750256551). Status appears as "shipped" in one feed and "Shipped" in the other, and occasionally as an unknown value that should fall back to a safe default rather than reach the database. The additions, in shorthand: one new import, two new sections (steps 3 and 4) added after the type-coercion block, and two new fields on the returned dict.

order_normalizer.py (pass 2 additions)

# Add to imports at the top of order_normalizer.py:
from datetime import datetime, timezone

# Inside normalize_order, after the type-coercion block, add:

    # PATTERN 3: TIME UNIFICATION
    # Convert unix timestamp to ISO 8601, or keep ISO string
    created_raw = try_fields(raw, ["created_at", "ts"])
    if isinstance(created_raw, (int, float)):
        created_at = datetime.fromtimestamp(created_raw, tz=timezone.utc).isoformat().replace("+00:00", "Z")
    else:
        created_at = created_raw

    # PATTERN 4: ENUM HARMONIZATION
    # Normalize status to lowercase and validate against allowed set
    status_raw = try_fields(raw, ["status", "state"], "pending")
    status = status_raw.lower() if status_raw else "pending"
    allowed_statuses = {"pending", "processing", "shipped", "delivered", "cancelled"}
    if status not in allowed_statuses:
        status = "pending"

# Add the new fields to the returned dict:
    return {
        "id": order_id,
        "created_at": created_at,
        "total": total,
        "currency": currency,
        "status": status,
    }

Full order_normalizer.py after pass 2 (about 40 lines, with pass 1 unchanged): expand to view or copy fresh

order_normalizer.py (pass 2)

from datetime import datetime, timezone
from typing import Any, Dict
from safe_get import try_fields


def normalize_order(raw: Dict[str, Any]) -> Dict[str, Any]:
    """Pass 2: adds time unification and enum harmonization."""
    # PATTERN 1: FIELD MAPPING
    order_id_raw = try_fields(raw, ["id", "order_id"])
    order_id = str(order_id_raw)
    if order_id and not order_id.startswith("ORD-"):
        order_id = f"ORD-{order_id}"

    # PATTERN 2: TYPE COERCION
    total_raw = try_fields(raw, ["total", "amount"])
    total = float(total_raw) if total_raw is not None else 0.0

    currency = try_fields(raw, ["currency", "currency_code"], "USD")
    currency = currency.upper() if currency else "USD"

    # PATTERN 3: TIME UNIFICATION
    created_raw = try_fields(raw, ["created_at", "ts"])
    if isinstance(created_raw, (int, float)):
        created_at = datetime.fromtimestamp(created_raw, tz=timezone.utc).isoformat().replace("+00:00", "Z")
    else:
        created_at = created_raw

    # PATTERN 4: ENUM HARMONIZATION
    status_raw = try_fields(raw, ["status", "state"], "pending")
    status = status_raw.lower() if status_raw else "pending"
    allowed_statuses = {"pending", "processing", "shipped", "delivered", "cancelled"}
    if status not in allowed_statuses:
        status = "pending"

    return {
        "id": order_id,
        "created_at": created_at,
        "total": total,
        "currency": currency,
        "status": status,
    }

Save this as verify_pass2.py. The verify uses the full variant payloads now so the new fields have something to chew on:

verify_pass2.py

from order_normalizer import normalize_order

variant_a = {
    "id": "ORD-91352",
    "created_at": "2025-06-18T14:22:31Z",
    "total": "129.50",
    "currency": "EUR",
    "status": "shipped",
}
variant_b = {
    "order_id": 91352,
    "ts": 1750256551,
    "amount": 129.5,
    "currency_code": "EUR",
    "state": "Shipped",
}

result_a = normalize_order(variant_a)
result_b = normalize_order(variant_b)

print(f"A timestamp: {result_a['created_at']}")
print(f"B timestamp: {result_b['created_at']}")
print(f"A status:    {result_a['status']}")
print(f"B status:    {result_b['status']}")
print(f"Same shape:  {result_a == result_b}")

Terminal

A timestamp: 2025-06-18T14:22:31Z
B timestamp: 2025-06-18T14:22:31Z
A status:    shipped
B status:    shipped
Same shape:  True

If the timestamps match and both statuses lowercase to shipped, pass 2 is working: Unix integers convert to ISO 8601 with a Z suffix, and status values lowercase + validate against the allowed set so anything unknown falls back to pending. If the timestamps differ -- e.g. Variant B prints something like 2025-06-18T14:22:31+00:00 instead of the Z form -- the .replace("+00:00", "Z") tail isn't firing, almost always because the line was lost during paste. If Same shape prints False but the individual fields look right, you've likely got a stale variant_a or variant_b literal still containing keys from a previous pass.

Pass 3: embed customer and normalize item arrays

Pass 3 handles two patterns that operate on nested structures rather than flat fields. Variant A embeds customer data as a sub-object ({"id": 7712, "email": "alice@example.com"}); Variant B only provides a customer_id string. Items appear as items in one feed and line_items in the other, with different per-item field names and one level of extra nesting on the SKU. After this pass both variants emit near-identical structures; the last remaining differences are pagination wrapping and discounts, which land in pass 4.

order_normalizer.py (pass 3 additions)

# Add to imports at the top of order_normalizer.py:
from safe_get import safe_get, try_fields  # safe_get is new

# Inside normalize_order, after the enum-harmonization block, add:

    # PATTERN 5: JOIN STRATEGY (customer embedding)
    # Handle embedded object vs reference ID
    customer_obj = raw.get("customer")
    if isinstance(customer_obj, dict):
        customer = {
            "id": customer_obj.get("id"),
            "email": customer_obj.get("email", "Unknown"),
        }
    else:
        customer_id = raw.get("customer_id")
        customer = {
            "id": int(customer_id) if customer_id else None,
            "email": "Unknown",
        }

    # PATTERN 8: ARRAY PROCESSING (item normalization)
    # Handle items vs line_items with different nesting
    raw_items = try_fields(raw, ["items", "line_items"], [])
    items = []
    for item in raw_items:
        # SKU might be nested in product.sku or flat
        sku = safe_get(item, "product.sku") or item.get("sku")
        qty_raw = try_fields(item, ["qty", "quantity"], 0)
        qty = int(qty_raw) if qty_raw else 0
        price_raw = try_fields(item, ["price", "unit_price"], 0)
        price = float(price_raw) if price_raw else 0.0
        items.append({"sku": sku, "qty": qty, "price": price})

# Add the new fields to the returned dict:
    return {
        "id": order_id,
        "created_at": created_at,
        "total": total,
        "currency": currency,
        "customer": customer,
        "items": items,
        "status": status,
    }

Full order_normalizer.py after pass 3 (about 75 lines, with passes 1-2 unchanged): expand to view or copy fresh

order_normalizer.py (pass 3)

from datetime import datetime, timezone
from typing import Any, Dict
from safe_get import safe_get, try_fields


def normalize_order(raw: Dict[str, Any]) -> Dict[str, Any]:
    """Pass 3: adds customer embedding and item normalization."""
    # PATTERN 1: FIELD MAPPING
    order_id_raw = try_fields(raw, ["id", "order_id"])
    order_id = str(order_id_raw)
    if order_id and not order_id.startswith("ORD-"):
        order_id = f"ORD-{order_id}"

    # PATTERN 2: TYPE COERCION
    total_raw = try_fields(raw, ["total", "amount"])
    total = float(total_raw) if total_raw is not None else 0.0

    currency = try_fields(raw, ["currency", "currency_code"], "USD")
    currency = currency.upper() if currency else "USD"

    # PATTERN 3: TIME UNIFICATION
    created_raw = try_fields(raw, ["created_at", "ts"])
    if isinstance(created_raw, (int, float)):
        created_at = datetime.fromtimestamp(created_raw, tz=timezone.utc).isoformat().replace("+00:00", "Z")
    else:
        created_at = created_raw

    # PATTERN 4: ENUM HARMONIZATION
    status_raw = try_fields(raw, ["status", "state"], "pending")
    status = status_raw.lower() if status_raw else "pending"
    allowed_statuses = {"pending", "processing", "shipped", "delivered", "cancelled"}
    if status not in allowed_statuses:
        status = "pending"

    # PATTERN 5: JOIN STRATEGY (customer embedding)
    customer_obj = raw.get("customer")
    if isinstance(customer_obj, dict):
        customer = {
            "id": customer_obj.get("id"),
            "email": customer_obj.get("email", "Unknown"),
        }
    else:
        customer_id = raw.get("customer_id")
        customer = {
            "id": int(customer_id) if customer_id else None,
            "email": "Unknown",
        }

    # PATTERN 8: ARRAY PROCESSING (item normalization)
    raw_items = try_fields(raw, ["items", "line_items"], [])
    items = []
    for item in raw_items:
        sku = safe_get(item, "product.sku") or item.get("sku")
        qty_raw = try_fields(item, ["qty", "quantity"], 0)
        qty = int(qty_raw) if qty_raw else 0
        price_raw = try_fields(item, ["price", "unit_price"], 0)
        price = float(price_raw) if price_raw else 0.0
        items.append({"sku": sku, "qty": qty, "price": price})

    return {
        "id": order_id,
        "created_at": created_at,
        "total": total,
        "currency": currency,
        "customer": customer,
        "items": items,
        "status": status,
    }

A design note about the customer handling: when Variant B provides only a customer_id, the normalizer creates a shell object -- {"id": 7712, "email": "Unknown"}. That may feel like fabricating data, but it's deliberate. Without the shell, every part of your application that touches customer data would need to branch on whether customer is a dict or a string. The normalizer absorbs that complexity in one place so downstream code can always write order["customer"]["email"] safely. If you need the real customer data for Variant B, fetch it by ID and populate the shell -- the placeholder tells you exactly which field needs filling.

Save this as verify_pass3.py. The verify exercises both the customer-object and customer-id branches plus the nested-SKU and flat-SKU item shapes:

verify_pass3.py

from order_normalizer import normalize_order

variant_a = {
    "id": "ORD-91352",
    "created_at": "2025-06-18T14:22:31Z",
    "total": "129.50",
    "currency": "EUR",
    "customer": {"id": 7712, "email": "alice@example.com"},
    "items": [
        {"sku": "P-001", "qty": 2, "price": 39.75},
        {"sku": "P-009", "qty": 1, "price": 50.00},
    ],
    "status": "shipped",
}
variant_b = {
    "order_id": 91352,
    "ts": 1750256551,
    "amount": 129.5,
    "currency_code": "EUR",
    "customer_id": "7712",
    "line_items": [
        {"product": {"sku": "P-001"}, "quantity": "2", "unit_price": "39.75"},
        {"product": {"sku": "P-009"}, "quantity": 1, "unit_price": 50},
    ],
    "state": "Shipped",
}

result_a = normalize_order(variant_a)
result_b = normalize_order(variant_b)

print(f"A customer: {result_a['customer']}")
print(f"B customer: {result_b['customer']}")
print(f"A items[0]: {result_a['items'][0]}")
print(f"B items[0]: {result_b['items'][0]}")
print(f"Same item structure: {result_a['items'][0].keys() == result_b['items'][0].keys()}")

Terminal

A customer: {'id': 7712, 'email': 'alice@example.com'}
B customer: {'id': 7712, 'email': 'Unknown'}
A items[0]: {'sku': 'P-001', 'qty': 2, 'price': 39.75}
B items[0]: {'sku': 'P-001', 'qty': 2, 'price': 39.75}
Same item structure: True

If both customers carry an id of 7712 and both items render with identical sku/qty/price keys, pass 3 is working: the join-strategy branch picks up either shape, and the SKU lookup falls through from product.sku to a flat sku as needed. The two customers differ on email, and that's expected -- Variant B's feed genuinely doesn't carry the email, so the shell records Unknown. If Variant B's customer prints {'id': None, 'email': 'Unknown'}, the int(customer_id) if customer_id else None branch is treating the string "7712" as falsy, usually because customer_id is being read off the wrong key or the dict literal has it commented out.

Pass 4: adapt pagination and require essentials

Pass 4 finishes the eight patterns and adds the production polish: pagination adaptation lifted out into its own helper, optional-field handling for discounts and promos, and a MissingRequired guard that refuses to invent an order ID when both id and order_id are missing. This is the canonical version every downstream module imports from -- it stays inline rather than collapsed because, like the assembled orchestrator in chapter 13, it's the file the rest of the project depends on. The new pieces: one new import (MissingRequired from field_policies), one new module-level helper (extract_orders_with_pagination), two new patterns inside normalize_order (discount handling and the required-field guard), and a STATUS_SYNONYMS map so American spellings like canceled map to the canonical cancelled before validation rather than after.

order_normalizer.py (final, after pass 4)

from datetime import datetime, timezone
from typing import Any, Dict, List, Tuple
from field_policies import MissingRequired
from safe_get import safe_get, try_fields

ALLOWED_STATUSES = {"pending", "processing", "shipped", "delivered", "cancelled"}
STATUS_SYNONYMS = {"canceled": "cancelled"}  # American -- British canonical spelling


def normalize_order(raw: Dict[str, Any]) -> Dict[str, Any]:
    """
    Complete normalizer implementing all eight transformation patterns.
    Transforms both order variants into identical canonical shape.
    """
    # PATTERN 1: FIELD MAPPING: rename to canonical names
    order_id_raw = try_fields(raw, ["id", "order_id"])
    if order_id_raw in (None, ""):
        raise MissingRequired("Missing required field: id")

    order_id = str(order_id_raw)
    if order_id and not order_id.startswith("ORD-"):
        order_id = f"ORD-{order_id}"

    # PATTERN 2: TYPE COERCION: ensure consistent types
    total_raw = try_fields(raw, ["total", "amount"])
    total = float(total_raw) if total_raw is not None else 0.0

    currency = try_fields(raw, ["currency", "currency_code"], "USD")
    currency = currency.upper() if currency else "USD"

    # PATTERN 3: TIME UNIFICATION: normalize to ISO 8601
    created_raw = try_fields(raw, ["created_at", "ts"])
    if isinstance(created_raw, (int, float)):
        created_at = datetime.fromtimestamp(created_raw, tz=timezone.utc).isoformat().replace("+00:00", "Z")
    else:
        created_at = created_raw

    # PATTERN 4: ENUM HARMONIZATION: standardize status values
    status_raw = try_fields(raw, ["status", "state"], "pending")
    status = status_raw.lower() if status_raw else "pending"
    status = STATUS_SYNONYMS.get(status, status)  # Map known synonyms before validation
    if status not in ALLOWED_STATUSES:
        status = "pending"

    # PATTERN 5: JOIN STRATEGY: embed customer data
    customer_obj = raw.get("customer")
    if isinstance(customer_obj, dict):
        customer = {
            "id": customer_obj.get("id"),
            "email": customer_obj.get("email", "Unknown"),
        }
    else:
        customer_id = raw.get("customer_id")
        customer = {
            "id": int(customer_id) if customer_id else None,
            "email": "Unknown",
        }

    # PATTERN 8: ARRAY PROCESSING: normalize item collections
    raw_items = try_fields(raw, ["items", "line_items"], [])
    items = []
    for item in raw_items:
        sku = safe_get(item, "product.sku") or item.get("sku")
        qty_raw = try_fields(item, ["qty", "quantity"], 0)
        qty = int(qty_raw) if qty_raw else 0
        price_raw = try_fields(item, ["price", "unit_price"], 0)
        price = float(price_raw) if price_raw else 0.0
        items.append({"sku": sku, "qty": qty, "price": price})

    # PATTERN 7: OPTIONAL FIELD HANDLING: discount/promo normalization
    # CONDITIONALLY REQUIRED: if promo is present, promo.value is required
    discount = None
    if raw.get("discount") is not None:
        discount = raw["discount"]
    elif raw.get("promo"):
        promo = raw["promo"]
        if "value" not in promo:
            raise MissingRequired("promo.value is required when promo is present")
        value_str = promo["value"]
        if isinstance(value_str, str) and value_str.endswith("%"):
            discount = {"type": "percent", "value": int(value_str.rstrip("%"))}
        else:
            discount = {"type": "fixed", "value": float(value_str)}

    return {
        "id": order_id,
        "created_at": created_at,
        "total": total,
        "currency": currency,
        "customer": customer,
        "items": items,
        "discount": discount,
        "status": status,
    }


def extract_orders_with_pagination(
    api_response: Dict[str, Any],
) -> Tuple[List[Dict[str, Any]], Any]:
    """
    Extract orders from either variant and return (orders, next_token).
    Implements PATTERN 6: PAGINATION ADAPTER.
    """
    orders_raw = (
        safe_get(api_response, "data.orders")
        or api_response.get("orders")
        or (api_response if isinstance(api_response, list) else [])
    )

    orders = [normalize_order(o) for o in orders_raw]

    # PATTERN 6: PAGINATION ADAPTER: unify cursor vs nextPage
    next_token = (
        safe_get(api_response, "meta.cursor")
        or api_response.get("nextPage")
    )

    return orders, next_token

A note about the container detection in extract_orders_with_pagination: it does its own safe_get(api_response, "data.orders") / api_response.get("orders") walk rather than reusing extract_items_and_meta from the flexible-access page. The reason is that the orders container can sit one level deeper than the top-level wrapper detection extract_items_and_meta targets -- Variant A has data.orders, Variant B has orders at the root. Inline detection here is simpler than threading a nested path through container_hints. For top-level wrappers (the search-results case from the previous pages), extract_items_and_meta remains the right tool.

The verify for pass 4 has two scripts. The first is the full-success path: feed both variants through extract_orders_with_pagination end-to-end and confirm the canonical structures match. The second is a deliberate break: hand the normalizer an order missing both id and order_id and watch MissingRequired raise rather than silently emitting a record with an empty string ID. The break is what makes the required-field guard earn its keep; without it, garbage data slips through and breaks something downstream that's harder to diagnose.

verify_pass4.py

from order_normalizer import extract_orders_with_pagination

variant_a = {
    "data": {
        "orders": [{
            "id": "ORD-91352",
            "created_at": "2025-06-18T14:22:31Z",
            "total": "129.50",
            "currency": "EUR",
            "customer": {"id": 7712, "email": "alice@example.com"},
            "items": [
                {"sku": "P-001", "qty": 2, "price": 39.75},
                {"sku": "P-009", "qty": 1, "price": 50.00},
            ],
            "discount": None,
            "status": "shipped",
        }]
    },
    "meta": {"cursor": "eyJwYWdlIjoyfQ=="},
}

variant_b = {
    "orders": [{
        "order_id": 91352,
        "ts": 1750256551,
        "amount": 129.5,
        "currency_code": "EUR",
        "customer_id": "7712",
        "line_items": [
            {"product": {"sku": "P-001"}, "quantity": "2", "unit_price": "39.75"},
            {"product": {"sku": "P-009"}, "quantity": 1, "unit_price": 50},
        ],
        "promo": {"code": "SUMMER", "value": "10%"},
        "state": "Shipped",
    }],
    "nextPage": "/orders?page=2",
}

print("--- Variant A (modern endpoint) ---")
orders_a, next_a = extract_orders_with_pagination(variant_a)
print(f"Order {orders_a[0]['id']}: {orders_a[0]['total']:.2f} {orders_a[0]['currency']}")
print(f"  status={orders_a[0]['status']}  items={len(orders_a[0]['items'])}  discount={orders_a[0]['discount']}")
print(f"  next: {next_a}")

print("\n--- Variant B (legacy feed) ---")
orders_b, next_b = extract_orders_with_pagination(variant_b)
print(f"Order {orders_b[0]['id']}: {orders_b[0]['total']:.2f} {orders_b[0]['currency']}")
print(f"  status={orders_b[0]['status']}  items={len(orders_b[0]['items'])}  discount={orders_b[0]['discount']}")
print(f"  next: {next_b}")

print("\n--- Canonical structure check ---")
print(f"Same keys:        {orders_a[0].keys() == orders_b[0].keys()}")
print(f"Same id format:   {orders_a[0]['id'] == orders_b[0]['id']}")
print(f"Same item shape:  {orders_a[0]['items'][0].keys() == orders_b[0]['items'][0].keys()}")

Terminal

--- Variant A (modern endpoint) ---
Order ORD-91352: 129.50 EUR
  status=shipped  items=2  discount=None
  next: eyJwYWdlIjoyfQ==

--- Variant B (legacy feed) ---
Order ORD-91352: 129.50 EUR
  status=shipped  items=2  discount={'type': 'percent', 'value': 10}
  next: /orders?page=2

--- Canonical structure check ---
Same keys:        True
Same id format:   True
Same item shape:  True

If both orders print with the same canonical id, both reach the same item shape, and the three structure checks all return True, pass 4 is working: pagination adapts to meta.cursor or nextPage, optional discount/promo lands as None or a structured object, and both variants emit identical canonical structures. The one surviving difference -- Variant B's customer email comes back as Unknown while Variant A's is alice@example.com -- is correct and by design (Variant B's feed doesn't carry the email). If the discount check returns {'type': 'fixed', 'value': 10.0} for Variant B instead of {'type': 'percent', 'value': 10}, the percentage branch in pattern 7 isn't recognising the trailing %, almost always because the endswith test was rewritten without the trailing-percent guard.

Now force the failure mode. Save this as verify_pass4_break.py and run it. The script hands the normalizer an order with no id or order_id at all and confirms it raises MissingRequired rather than silently emitting a record with an empty string ID:

verify_pass4_break.py

from field_policies import MissingRequired
from order_normalizer import normalize_order

# Variant B with both id and order_id removed
broken = {
    "ts": 1750256551,
    "amount": 129.5,
    "currency_code": "EUR",
    "customer_id": "7712",
    "line_items": [],
    "state": "Shipped",
}

try:
    result = normalize_order(broken)
    print(f"FAIL: normalizer accepted broken input, returned id={result['id']!r}")
except MissingRequired as e:
    print(f"OK: raised MissingRequired -- {e}")

Terminal

OK: raised MissingRequired -- Missing required field: id

The break-it run is what makes the required-field guard earn its keep. Earlier passes would have silently emitted {'id': 'ORD-', ...} -- a record that looks structurally fine and survives every downstream key lookup, but carries an ID that points at no real order. The polish pass refuses to invent that record. If you see the FAIL line instead, the raise MissingRequired branch isn't firing, almost always because the if order_id_raw in (None, "") check was placed after the prefix-rewrite block instead of before it.

What you've built

You started with a raw-API problem -- the same business object arriving in two different shapes -- and built a complete diagnostic and normalization toolkit in four passes. Exploration tools for mapping unknown shapes. Flexible access utilities for surviving container variation. Safe navigation for deep nesting. Defensive guards for optional fields. And finally the normalizer that collapses every source-side variation into one canonical output, with the required-field guard catching the cases the earlier patterns can't recover from.

This isn't a toy example. The pattern you built incrementally is how professional teams handle multi-source integrations -- payment processors, e-commerce platforms, logistics systems all apply variations of this approach to turn API chaos into predictable internal models.

The eight transformation patterns are all present in the final file:

Field mapping: renamed order_id to id, amount to total
Type coercion: converted string prices to floats, ensured consistent types
Time unification: normalized Unix timestamps to ISO 8601
Enum harmonization: lowercased and validated status values
Join strategy: embedded customer data whether it arrived as object or ID
Pagination adapter: extracted next_token from cursor or URL patterns
Optional field handling: normalized discount/promo with safe defaults
Array processing: extracted items regardless of wrapper name or nesting

Simple integrations use two or three patterns; complex ones reach for all eight. The complete toolkit is what lets you estimate and scope integration work with confidence instead of guessing.

Your `api_helpers.py` is complete

If you've been saving each utility as you built it, your helpers file now contains everything you need for professional API integration:

api_helpers.py

# api_helpers.py: the complete toolkit from Chapter 10

# Structure exploration
def explore_api_structure(url, max_depth=2): ...
def truncate_for_display(obj, max_depth=2, current_depth=0): ...

# Container normalization & safe access
def normalize_collection(data, container_hints): ...
def extract_items_and_meta(data, container_hints): ...
def first_item(data, container_hints): ...
def safe_get(obj, path, default=None): ...
def try_fields(d, names, default=None): ...

# Defensive guards
class MissingRequired(Exception): ...
def require(d, name): ...
def default(d, name, fallback): ...

# Domain-specific normalizers
def normalize_order(raw): ...
def extract_orders_with_pagination(api_response): ...

Carry this file into every project. Add new domain-specific normalizers as you integrate new APIs. This is how reusable infrastructure grows: not by memorising patterns but by keeping a toolkit and reaching for it whenever the next API arrives.

Everything structural has been handled. The final page of the chapter is the review: what to carry forward, what the quiz pressure-tests, and what Chapter 12's validation work picks up from here.

6. Building the complete normalizer

Building the normalizer in passes

Pass 1: rename fields and coerce types

Pass 2: unify timestamps and harmonize status enums

Pass 3: embed customer and normalize item arrays

Pass 4: adapt pagination and require essentials

What you've built

Your api_helpers.py is complete

Your `api_helpers.py` is complete