3. Flexible access patterns
Discovery maps the shape; this page writes the access code that doesn't care which shape arrives. One access layer handles direct arrays, wrapped collections, and single objects, preserves pagination metadata, tolerates alternative field names, and powers the vendor-orders normalizer you'll build at the end of the chapter.
Step 1: normalize any collection shape
The first utility handles the most common variation -- where the actual data lives. Some APIs return arrays directly. Others wrap arrays under keys like items, results, or data. Single-item endpoints often return an unwrapped object. Your code should not need to know which pattern each API uses.
The function below inspects the response type and structure, then returns a list regardless of the input shape. Direct arrays pass through unchanged. Wrapped collections get unwrapped. Single objects become one-item lists. After this, the rest of your code can always expect a list. Save it as normalize_collection.py:
from typing import Any, List, Optional
COMMON_COLLECTION_KEYS = ["items", "results", "data", "content", "entries", "records"]
def normalize_collection(
api_response: Any,
container_hints: Optional[List[str]] = None,
) -> List[Any]:
"""
Return a list of items regardless of response shape:
- list -> itself
- dict+wrapper -> wrapper list
- dict (single) -> [dict]
- other -> []
"""
# Direct array: pass through
if isinstance(api_response, list):
return api_response
# Not a dict: can't extract anything
if not isinstance(api_response, dict):
return []
# Check for common wrapper keys
keys = (container_hints or []) + COMMON_COLLECTION_KEYS
for key in keys:
val = api_response.get(key)
if isinstance(val, list):
return val
# No wrapper found: treat dict as single item
return [api_response]
The container_hints parameter lets you handle domain-specific wrappers without modifying the function. If an API uses products or repositories instead of the common patterns, pass them as hints. You'll keep this function in your toolkit and reach for it again and again -- each new API you integrate is a chance to add another wrapper convention to your hints list.
Step 2: extract items and preserve metadata
The normalizer above solves one problem but creates another: it discards everything except the items. Real APIs include valuable metadata -- pagination cursors, total counts, page information -- that downstream code often needs. APIs signal "more data available" in several ways:
- Page numbers:
?page=2(like book pages) - Cursors: opaque tokens like
"cursor": "eyJwYWdlIjoyfQ==" - Next URLs: direct links like
"/search?q=python&page=2" - Offset/limit:
?offset=20&limit=20(skip 20, get next 20)
The enhanced helper below does two jobs. It normalizes the collection structure (like the previous function) and captures everything else as metadata. It also recognises the common pagination patterns and collapses them into a single next_token field, so downstream code doesn't need to know which style an API uses. Save it as extract_items_and_meta.py:
from typing import Any, Dict, List, Tuple, Optional
from normalize_collection import COMMON_COLLECTION_KEYS
def extract_items_and_meta(
api_response: Any,
container_hints: Optional[List[str]] = None,
) -> Tuple[List[Any], Dict[str, Any]]:
"""
Return (items, metadata) and normalize pagination signals to:
meta.next_token (cursor or next URL)
meta.total (total results if present)
meta.page_info (page/per_page if present)
"""
meta: Dict[str, Any] = {}
# Direct list: no metadata
if isinstance(api_response, list):
return api_response, meta
# Not a dict: can't extract anything
if not isinstance(api_response, dict):
return [], meta
# Find the collection container
keys = (container_hints or []) + COMMON_COLLECTION_KEYS
container_key = None
for key in keys:
if key in api_response and isinstance(api_response[key], list):
container_key = key
break
# Extract items and separate metadata
if container_key:
items = api_response[container_key]
# Everything else is metadata
meta = {k: v for k, v in api_response.items() if k != container_key}
else:
# Single object response
items = [api_response]
meta = {}
# Normalize pagination signals into common format
meta_obj = meta.get("meta") if isinstance(meta.get("meta"), dict) else {}
# Cursor-style pagination
next_token = (
meta_obj.get("cursor")
or meta.get("cursor")
or None
)
# URL-style pagination
if not next_token:
links = meta.get("links") if isinstance(meta.get("links"), dict) else {}
next_token = meta.get("nextPage") or links.get("next") or None
# Count and page information
total = meta.get("total") or meta.get("total_count") or meta_obj.get("total") or None
page = meta.get("page") or meta_obj.get("page")
per_page = meta.get("per_page") or meta_obj.get("per_page")
page_info = {"page": page, "per_page": per_page} if (page or per_page) else None
meta_norm = {"next_token": next_token, "total": total, "page_info": page_info}
return items, {**meta, **meta_norm}
Whether an API uses cursors, page numbers, or next-URL links, callers of extract_items_and_meta() always get a next_token field to check for more data. The original provider-specific keys stay in meta too, so you still have everything if you need to inspect it. One pagination convention this helper does not cover: HTTP Link headers (used by GitHub's search endpoint and others). Those signals live outside the JSON body, so the helper can't see them; if you hit a header-paginated API, you'll need to parse response.headers["Link"] separately and feed the cursor or URL back in.
Step 3: a convenience helper for single items
Many API calls fetch exactly one resource -- one user, one repository, one order. Rather than normalising to a list and immediately accessing [0], this helper does both steps safely. Save it next to the others:
from typing import Any, Dict, List, Optional
from normalize_collection import normalize_collection
def first_item(
api_response: Any,
container_hints: Optional[List[str]] = None,
) -> Optional[Dict[str, Any]]:
"""Get the first item (or None) across response variants."""
items = normalize_collection(api_response, container_hints)
return items[0] if items else None
This is useful for detail endpoints where you know there's exactly one result, and for processing search results one record at a time.
Seeing it work against the two GitHub endpoints
The single-repository endpoint returns an unwrapped object with many fields; the search endpoint wraps results in an items array with metadata. The same access code should handle both. Save this driver as test_extract.py:
import requests
from extract_items_and_meta import extract_items_and_meta
from first_item import first_item
# Fetch both GitHub response types
single_repo = requests.get(
"https://api.github.com/repos/octocat/Hello-World",
timeout=10,
).json()
search_results = requests.get(
"https://api.github.com/search/repositories?q=python&per_page=2",
timeout=10,
).json()
print("=== Testing Universal Access Patterns ===\n")
# Single repository endpoint
items1, meta1 = extract_items_and_meta(single_repo)
print("Single repo endpoint:")
print(f" Items returned: {len(items1)}")
print(f" Pagination token: {meta1.get('next_token')}")
print(f" Total count: {meta1.get('total')}")
print(f" First item name: {items1[0].get('name')}\n")
# Search endpoint
items2, meta2 = extract_items_and_meta(search_results)
print("Search endpoint:")
print(f" Items returned: {len(items2)}")
print(f" Pagination token: {meta2.get('next_token')}")
print(f" Total count: {meta2.get('total'):,}")
print(f" First item name: {items2[0].get('name')}\n")
# Convenience helper
first = first_item(single_repo)
print("Using first_item() helper:")
print(f" Repository: {first.get('name')} by {first.get('owner', {}).get('login')}")
Run it from the project root:
python test_extract.py
Representative output. GitHub's search totals and ranking change over time, so your total count and first result may differ:
=== Testing Universal Access Patterns ===
Single repo endpoint:
Items returned: 1
Pagination token: None
Total count: None
First item name: Hello-World
Search endpoint:
Items returned: 2
Pagination token: None
Total count: 8,937,004
First item name: public-apis
Using first_item() helper:
Repository: Hello-World by octocat
The same extraction code worked against both responses. The single-repository endpoint was normalized to a one-item list; the search endpoint's items array was extracted, and its total_count came through under meta['total']. Downstream code sees a consistent interface regardless of API structure.
Step 4: safe field access utilities
With containers normalized, two field-level challenges remain: navigating nested paths safely, and dealing with APIs that use different field names for the same concept. Two small helpers cover both. Save them as safe_get.py:
from typing import Any, Dict, List
def safe_get(obj: Any, path: str, default=None):
"""
Dot-path lookup: 'owner.login' -> obj['owner']['login'] if present.
Returns default if any part of the path doesn't exist.
"""
cur = obj
for part in path.split("."):
if not isinstance(cur, dict) or part not in cur:
return default
cur = cur[part]
return cur
def try_fields(d: Dict[str, Any], names: List[str], default=None):
"""
Return the first present/non-empty field from a list of candidates.
Useful when different APIs use different names for the same concept.
"""
for name in names:
val = d.get(name)
if val not in (None, ""):
return val
return default
safe_get() handles nested navigation across dict keys without crashes. try_fields() handles field-name variations -- trying id then order_id, or total then amount. This version walks dict keys only; the navigation page extends it to understand periods[0]-style array indices as first-class path segments, so a single dot-path can drill through mixed dict-and-array nesting.
Build a project utilities file
Don't copy-paste these functions into every script you write. Create a dedicated api_helpers.py in your project directory and collect them there as you build them. By the end of the chapter you'll have safe_get(), try_fields(), normalize_collection(), extract_items_and_meta(), and more -- all in one place. Whenever you need them:
from api_helpers import safe_get, extract_items_and_meta
# Use them anywhere in your main script
owner = safe_get(repo, "owner.login", "Unknown")
This is how professional developers actually work. Nobody memorises these patterns, they build a personal toolkit and reuse it. The api_helpers.py you leave this chapter with is infrastructure you carry into every future project.
From here on, the per-file "save as" labels on each code block (normalize_collection.py, safe_get.py, and so on) treat each function's file as a convenience for running the example in isolation -- the canonical home for all of them is the api_helpers.py you just started. When the later pages say "replace safe_get.py with this extended version" or "save field_policies.py," feel free to paste the new content into the appropriate section of api_helpers.py instead. The standalone filenames are about teaching one idea per file; the toolkit is one file.
Putting it together: a cross-endpoint repository extractor
One more example that combines every utility on this page: a unified extractor that works against both GitHub endpoints. Save as extract_repo_info.py:
import requests
from first_item import first_item
from extract_items_and_meta import extract_items_and_meta
from safe_get import safe_get
def extract_repo_info(api_response):
"""
Return (repo_dict, meta) with consistent shape.
Works with single-object responses or wrapped collections.
"""
# First repository from any response shape
repo = first_item(api_response)
if not isinstance(repo, dict):
return None, {}
# Fields via safe navigation
info = {
"name": repo.get("name", "Unknown"),
"owner": safe_get(repo, "owner.login", "Unknown"),
"stars": repo.get("stargazers_count", 0),
"description": repo.get("description") or "No description",
"language": repo.get("language") or "Not specified",
"url": repo.get("html_url", ""),
"private": bool(repo.get("private", False)),
}
# Preserve pagination metadata
_, meta = extract_items_and_meta(api_response)
return info, meta
if __name__ == "__main__":
print("=== Cross-API Repository Extraction ===\n")
# Single repository endpoint
single = requests.get(
"https://api.github.com/repos/octocat/Hello-World",
timeout=10,
).json()
repo1, meta1 = extract_repo_info(single)
print("Single endpoint:")
print(f" {repo1['name']} by {repo1['owner']}")
print(f" Stars: {repo1['stars']:,}")
print(f" Language: {repo1['language']}")
print(f" Pagination: {meta1.get('next_token')}\n")
# Search endpoint
search = requests.get(
"https://api.github.com/search/repositories?q=python&per_page=1",
timeout=10,
).json()
repo2, meta2 = extract_repo_info(search)
print("Search endpoint:")
print(f" {repo2['name']} by {repo2['owner']}")
print(f" Stars: {repo2['stars']:,}")
print(f" Language: {repo2['language']}")
print(f" Total results: {meta2.get('total'):,}")
Run it. The repository names, star counts, languages, and totals below are representative because they come from GitHub's live API:
python extract_repo_info.py
=== Cross-API Repository Extraction ===
Single endpoint:
Hello-World by octocat
Stars: 3,126
Language: Not specified
Pagination: None
Search endpoint:
public-apis by public-apis
Stars: 294,142
Language: Python
Total results: 8,937,004
The same function handled both endpoints. It safely navigated nested fields like owner.login. It preserved pagination metadata. It filled sensible defaults for missing fields. Any caller of extract_repo_info() now receives a predictable dictionary regardless of which endpoint was hit.
Those are the wins you get from separating container normalisation, metadata extraction, and field access into independent utilities: downstream code stops caring which wrapper the API used, pagination info flows through instead of getting discarded, nested field navigation stops being a crash risk, and a new API with an unfamiliar wrapper works by adding one entry to container_hints.
Containers are normalized, metadata survives, top-level fields are reachable. The next challenge is what happens when the data you want lives five or six levels deep, behind a chain of dictionaries that may or may not be fully populated. That's where the safe-get helper earns its keep -- and where we extend it to handle array indices as first-class path segments.