4. Complex JSON navigation
Flat access is solved. This page tackles what happens when the field you need lives four or five levels deep, behind a chain of dictionaries that may or may not be populated, inside arrays that may or may not exist: one safe-get path expression that stays alive whether you're pulling owner.login from GitHub or customer.profile.contact.address.shipping.city from an orders API.
Pattern 1: nested objects two or three levels down
The simplest nested case is objects inside objects. GitHub's repository endpoint returns owner information nested one level deep. Rather than chaining bracket notation (which crashes if any level is missing), reach for safe_get() from the previous page. Save this as nested_access.py at the project root:
import requests
from safe_get import safe_get
resp = requests.get(
"https://api.github.com/repos/octocat/Hello-World",
timeout=10,
)
resp.raise_for_status()
repo = resp.json()
name = repo.get("name", "Unknown")
owner_login = safe_get(repo, "owner.login", "Unknown")
owner_url = safe_get(repo, "owner.html_url", "")
language = repo.get("language") or "Not specified"
stars = repo.get("stargazers_count", 0)
print(f"Repository: {name}")
print(f"Owner: {owner_login}")
print(f"Profile: {owner_url}")
print(f"Language: {language}")
print(f"Stars: {stars:,}")
Run it. GitHub's star counts change over time, so your exact number may differ:
python nested_access.py
Repository: Hello-World
Owner: octocat
Profile: https://github.com/octocat
Language: Not specified
Stars: 3,126
The dot-notation ("owner.login") walks the path one key at a time. If owner is missing or not a dictionary, safe_get() returns the default instead of crashing. The pattern works for any depth of dict nesting -- but the moment the path has to step into a list (the first period in opening_hours.periods, the first review in reviews), the dict-only helper gives up and you fall back to manual array-index guards. That's the next upgrade.
Extending safe_get to understand array indices
The Google Places response we're about to navigate has paths like opening_hours.periods[0].open.time -- a mix of dict keys and array indices in the same path. Extending safe_get() to parse [0]-style segments as first-class path steps means one function can drill through mixed dict-and-array nesting without any surrounding guards. Replace the contents of safe_get.py with this upgraded version:
import re
from typing import Any, Dict, List
_SEGMENT = re.compile(r"^([^\[]+)((?:\[\d+\])*)$")
def safe_get(obj: Any, path: str, default=None):
"""
Dot-path lookup with array-index support.
Each dot-segment is a dict key, optionally followed by one or more
[N] index suffixes that step into a list. Returns `default` if any
step can't resolve (missing key, None intermediate, index out of range).
safe_get(data, "owner.login")
safe_get(data, "opening_hours.periods[0].open.time")
safe_get(data, "items[0].product.sku", "")
"""
cur = obj
for segment in path.split("."):
match = _SEGMENT.match(segment)
if not match:
return default
key, index_suffix = match.group(1), match.group(2)
# Dict key lookup
if not isinstance(cur, dict) or key not in cur:
return default
cur = cur[key]
# Any trailing [N] indices
for raw_index in re.findall(r"\[(\d+)\]", index_suffix):
i = int(raw_index)
if not isinstance(cur, list) or i >= len(cur):
return default
cur = cur[i]
return cur
def try_fields(d: Dict[str, Any], names: List[str], default=None):
"""
Return the first present/non-empty field from a list of candidates.
Useful when different APIs use different names for the same concept.
"""
for name in names:
val = d.get(name)
if val not in (None, ""):
return val
return default
The parser splits each dot-segment into a key and an optional run of [N] suffixes. Key lookup still requires a dict; index lookup still requires a list; either failure mode short-circuits to the default. The behaviour for plain dot paths like "owner.login" is identical to the previous version, so nested_access.py above keeps working unchanged. Everything that imports try_fields also keeps working; it's re-exported from the same file.
Pattern 2: deep nesting with optional sections
Real APIs often nest data four or five levels deep with optional branches. Google Places is a canonical example: location coordinates live inside result.geometry.location, opening hours inside result.opening_hours.periods, and reviews in an array on the root result. Each section might be missing entirely -- a business that hasn't listed hours, a place with no reviews yet.
Here's a simplified response shape:
{
"result": {
"name": "Google Building 40",
"geometry": {
"location": {"lat": 37.4224764, "lng": -122.0842499},
"viewport": {
"northeast": { "lat": 37.4238, "lng": -122.0829 },
"southwest": { "lat": 37.4211, "lng": -122.0856 }
}
},
"opening_hours": {
"periods": [
{ "open": {"day": 1, "time": "0900"}, "close": {"day": 1, "time": "1800"} }
],
"weekday_text": ["Monday: 9:00 AM - 6:00 PM"]
},
"reviews": [
{ "author_name": "John Smith", "rating": 5, "text": "Great place!" }
]
}
}
Navigating this safely means guarding every optional layer -- coordinate extraction, the periods array, the reviews array. Save as place_extractor.py:
from safe_get import safe_get
def describe_place(data):
"""Extract display fields from a Google Places response."""
place = data.get("result", {})
name = place.get("name", "Unknown")
# Deep nested coordinates (3 levels)
lat = safe_get(place, "geometry.location.lat")
lng = safe_get(place, "geometry.location.lng")
# Opening hours and reviews: array-index path segments do the guarding
open_time = safe_get(place, "opening_hours.periods[0].open.time")
close_time = safe_get(place, "opening_hours.periods[0].close.time")
review_author = safe_get(place, "reviews[0].author_name", "Anonymous")
review_text = safe_get(place, "reviews[0].text", "No review text")
print(f"Place: {name}")
print(f"Location: ({lat}, {lng})")
if open_time and close_time:
print(f"Hours: Opens {open_time}, closes {close_time}")
if review_author != "Anonymous" or review_text != "No review text":
print(f"Review by {review_author}: {review_text}")
Given the JSON above, calling describe_place(data) prints:
Place: Google Building 40
Location: (37.4224764, -122.0842499)
Hours: Opens 0900, closes 1800
Review by John Smith: Great place!
Four fields, four safe_get calls, no surrounding guards. The coordinates walk three dict levels; the opening-hours and review paths each cross a dict-to-list boundary via [0]. A place with no hours or no reviews produces the default values without crashing, and the print statements at the bottom skip the missing fields cleanly. Compare this against the old shape -- fetch-the-array-then-check-then-index-then-fetch-nested-field -- and you can see where the array-index extension earns its keep: one path string expresses what four lines of guards used to.
The discipline from the dict-only version still holds: extract shallow fields first, prefer a single safe path over a chain of manual checks, and let missing optional sections return the default rather than raising. The extension simply widens what "a single safe path" can reach.
Pattern 3: arrays of complex objects
The everyday workhorse pattern combines iteration with deep navigation -- looping through an array where each item is a nested object with its own optional fields. The key is to separate concerns: normalize the container once at the top, iterate cleanly, and delegate per-item navigation to the same safe helpers. Save as search_results.py:
import requests
from extract_items_and_meta import extract_items_and_meta
from safe_get import safe_get
resp = requests.get(
"https://api.github.com/search/repositories"
"?q=python+language:python&sort=stars&order=desc&per_page=2",
timeout=10,
)
resp.raise_for_status()
search = resp.json()
# Normalize the container and pagination metadata in one call
items, meta = extract_items_and_meta(search)
print(f"Total repositories found: {meta.get('total', 0):,}\n")
for i, repo in enumerate(items, start=1):
name = repo.get("name", "Unknown")
owner = safe_get(repo, "owner.login", "Unknown")
stars = repo.get("stargazers_count", 0)
# Optional nested license object
license_name = safe_get(repo, "license.name", "Not specified")
description = repo.get("description") or ""
desc_preview = description[:80] + "..." if len(description) > 80 else description
url = repo.get("html_url", "")
print(f"{i}. {name} by {owner}")
print(f" {stars:,} stars")
if desc_preview:
print(f" {desc_preview}")
print(f" License: {license_name}")
print(f" {url}\n")
Run it:
GitHub's search ranking, star counts, and total counts change over time, so treat the exact repositories and numbers below as representative:
python search_results.py
Total repositories found: 1,247,563
1. awesome-python by vinta
185,450 stars
A curated list of awesome Python frameworks, libraries, software and reso...
License: Other
https://github.com/vinta/awesome-python
2. public-apis by public-apis
294,142 stars
A collective list of free APIs for use in software and web development
License: MIT License
https://github.com/public-apis/public-apis
Iteration and navigation stay separate. The for loop handles "do this for each item"; the safe_get() and .get() calls handle "extract this field safely from one item." Each iteration is independent: some repos have licenses, some don't; some have descriptions, some don't; the code doesn't care. The same pattern scales from five items to five thousand.
Putting it together: a complete nested extractor
One more function, combining all three patterns: normalize containers, navigate deep nesting, handle arrays of complex objects, provide sensible defaults throughout. Save as repository_details.py:
import requests
from extract_items_and_meta import extract_items_and_meta
from safe_get import safe_get
def extract_repository_details(api_response):
"""
Extract repository info from any GitHub endpoint response.
Handles single repos, search results, nested owner data, optional fields.
"""
items, meta = extract_items_and_meta(api_response)
if not items:
return None, meta
repo = items[0]
details = {
"name": repo.get("name", "Unknown"),
"full_name": repo.get("full_name", "Unknown"),
"description": repo.get("description") or "No description",
# Nested owner (2 levels)
"owner_login": safe_get(repo, "owner.login", "Unknown"),
"owner_url": safe_get(repo, "owner.html_url", ""),
"owner_type": safe_get(repo, "owner.type", "Unknown"),
# Metrics
"stars": repo.get("stargazers_count", 0),
"forks": repo.get("forks_count", 0),
"watchers": repo.get("watchers_count", 0),
"open_issues": repo.get("open_issues_count", 0),
# Optional nested license
"license": safe_get(repo, "license.name", "Not specified"),
# Optional language
"language": repo.get("language") or "Not specified",
# Timestamps
"created": repo.get("created_at", "Unknown"),
"updated": repo.get("updated_at", "Unknown"),
# URLs
"repo_url": repo.get("html_url", ""),
"api_url": repo.get("url", ""),
# Flags
"private": bool(repo.get("private", False)),
"archived": bool(repo.get("archived", False)),
}
return details, meta
if __name__ == "__main__":
print("=== Single Repository ===")
single = requests.get(
"https://api.github.com/repos/octocat/Hello-World",
timeout=10,
).json()
repo_info, _ = extract_repository_details(single)
print(f"{repo_info['name']} by {repo_info['owner_login']}")
print(f"{repo_info['stars']:,} stars | Language: {repo_info['language']}")
print(f"License: {repo_info['license']}\n")
print("=== Search Results ===")
search = requests.get(
"https://api.github.com/search/repositories?q=python&per_page=1",
timeout=10,
).json()
repo_info, meta = extract_repository_details(search)
print(f"{repo_info['name']} by {repo_info['owner_login']}")
print(f"{repo_info['stars']:,} stars | Language: {repo_info['language']}")
print(f"Total matching repos: {meta.get('total', 0):,}")
The same live-data caveat applies to this output: star counts, search totals, and first search result can differ while the extraction shape stays the same.
=== Single Repository ===
Hello-World by octocat
3,126 stars | Language: Not specified
License: Not specified
=== Search Results ===
public-apis by public-apis
294,142 stars | Language: Python
Total matching repos: 8,937,004
One function handles every navigation challenge on the page: container normalization, deep nesting (owner.login), optional objects (license.name), missing descriptive fields, and metadata preservation. The same function works for single-repository endpoints and search results because it layers on the container helpers from the previous page.
This calls safe_get() and .get() a lot, which is fine for typical API responses (hundreds to thousands of items). For high-volume processing -- millions of records per batch -- consider extracting only the fields you actually need rather than a comprehensive dictionary; the constant overhead of a few microseconds per field starts to matter.
Containers, metadata, and deep navigation are solved. What's left is the policy question: when the data you expected isn't there, do you crash, warn, or substitute a default? The next page classifies fields by criticality and gives each class a matching handler.