2. The three-layer validation pattern
Validation is easier to reason about when you stop thinking of it as one problem and start thinking of it as three. Structure, content, and business rules are separate questions, each with its own failure modes and its own best tool. The rest of the chapter is the implementation; this page is the framework that tells you what to put where.
The expected response
The Weather Dashboard from Chapter 8 calls the Open-Meteo API and expects responses shaped like this:
{
"current": {
"temperature_2m": 22.5,
"relative_humidity_2m": 65,
"wind_speed_10m": 12.3,
"weather_code": 0,
"apparent_temperature": 21.8
},
"current_units": {
"temperature_2m": "°C",
"relative_humidity_2m": "%"
}
}
Your code assumes that shape and uses the values directly. When the API returns exactly this, everything works. Production APIs do not reliably return exactly this.
What bad data actually looks like
External APIs do not guarantee data quality. A sensor malfunctions and returns a placeholder. A backend format change turns a number into a string. A field goes missing. The response arrives with a 200 status code and technically valid JSON, but the values inside are garbage:
{
"current": {
"temperature_2m": -999, // Sensor error placeholder
"relative_humidity_2m": "N/A", // String instead of number
"wind_speed_10m": null, // Missing data
"weather_code": 71, // Snow code
"apparent_temperature": 150 // Impossible value
}
}
Without validation, three outcomes follow from data like this: a crash on a type error ("N/A" > 30 raises TypeError), a crash on a missing field (data["current"] raises KeyError), or silent corruption where -999 displays as a temperature and no one notices until a user complains. Here is the display function as written without validation, followed by the three ways it fails:
def display_current_weather(weather_data, location_name):
"""Display current weather. Crashes on bad data."""
current = weather_data["current"] # KeyError if "current" missing
temp = current["temperature_2m"] # KeyError if field missing
print(f"Temperature: {temp}°C")
if temp > 30: # TypeError if temp is "N/A"
print("It's hot today!")
# Users see "Temperature: -999°C" if the sensor is broken.
# No error. No crash. Just nonsense on the screen.
The fix is not to add more try/except around display_current_weather. Defensive programming cannot tell you that -999 is wrong; it looks like a perfectly reasonable integer. You need rules the data must satisfy before this function ever runs.
Why single-field checks don't scale
The obvious first attempt is a validator per field. Here is what that looks like for three of the fifteen fields a full weather response carries:
def validate_temperature(value):
if value is None:
return False, "Temperature missing"
try:
temp = float(value)
if temp < -100 or temp > 60:
return False, f"Temperature out of range: {temp}"
except (ValueError, TypeError):
return False, f"Temperature not numeric: {value}"
return True, None
def validate_humidity(value):
if value is None:
return False, "Humidity missing"
try:
humidity = float(value)
if humidity < 0 or humidity > 100:
return False, f"Humidity out of range: {humidity}"
except (ValueError, TypeError):
return False, f"Humidity not numeric: {value}"
return True, None
def validate_wind_speed(value):
if value is None:
return False, "Wind speed missing"
try:
wind = float(value)
if wind < 0 or wind > 200:
return False, f"Wind speed out of range: {wind}"
except (ValueError, TypeError):
return False, f"Wind speed not numeric: {value}"
return True, None
# And the same shape for the remaining twelve fields.
Every one of those functions has the same three moves: check for None, coerce to a number with float(), verify a range. Fifteen fields means fifteen functions that differ only in the name they check and the range they enforce. Worse, every API change (a renamed field, a new unit, a tightened range) means editing multiple validators and hoping you caught them all. The pattern is telling you something: there are really only a few categories of check here, and within each category, only a few rules. The three-layer pattern (the rest of this page) groups them by category. Declarative tooling (section 4) eliminates the repetition within each category.
Three layers, three questions
Each layer answers a different question and runs in a specific order, with each one depending on the previous having succeeded.
Layer 1: structural validation
Structural validation asks: does the data have the shape we expect? The required fields are present, nested objects are where you expect them, and the types match what your code assumes. It runs first because every later check needs the fields to exist before it can read them.
if not isinstance(data, dict):
return False, "Weather data must be a dictionary"
if "current" not in data:
return False, "Missing required 'current' section"
if not isinstance(data["current"], dict):
return False, "'current' must be a dictionary"
What it catches: missing sections, wrong types, breaking API shape changes. Generic enough that JSON Schema can automate most of it, as you'll see in section 4.
Layer 2: content validation
Content validation asks: are the values meaningful and correctly formed? Strings are not empty, timestamps can be parsed, URLs look like URLs, and numbers fall inside reasonable limits. It runs after structural validation because it assumes the fields are already safe to read.
if temp < -100 or temp > 60:
return False, f"Unrealistic temperature: {temp}°C"
if humidity < 0 or humidity > 100:
return False, f"Invalid humidity: {humidity}%"
if wind_speed < 0:
return False, "Wind speed cannot be negative"
What it catches: placeholder values (-999), sensor errors, format corruption, nonsensical single-field values. Field-specific, but patterns repeat enough that JSON Schema covers most of it too.
Layer 3: business-rule validation
Business-rule validation asks: does this data make sense for this application? Dates are not in the future, duplicate records are rejected, required relationships hold, and values obey the rules your product actually cares about. It runs last because it depends on both structure and content being trustworthy.
Cross-field rules
Many Layer 3 rules are cross-field: they depend on the relationship between two or more fields rather than on any single value. The fields are individually fine; the combination is what fails. Some examples from APIs you'll meet in practice:
- A booking API requires
start_dateto be earlier thanend_date. - A signup endpoint requires either
emailorphone, but not both. - An e-commerce checkout requires
shipping_addressonly whendelivery_methodis"ship", not"pickup". - An invoice's
totalmust equalsubtotal + tax. - A user-creation request with
account_type: "business"must includecompany_name.
Each rule looks at two or more fields together. Each requires conditional logic, comparison, or computation. JSON Schema can express some conditional presence rules, but domain-specific comparisons and calculations quickly become awkward to read and harder to test than Python. That is why this chapter keeps cross-field business rules in hand-written Python -- and why the hybrid pattern in section 5 keeps a Python function alongside the schema rather than trying to express everything declaratively.
# Snow codes in the WMO weather code set imply sub-zero conditions.
if weather_code in [71, 73, 75] and temp > 5:
return False, f"Snow at {temp}°C is unlikely"
# Apparent temperature should track actual temperature within reason.
if abs(apparent_temp - temp) > 20:
return False, f"'Feels like' {apparent_temp}°C too far from {temp}°C"
What it catches: logical inconsistencies, domain violations, impossible combinations. This is where JSON Schema stops helping and hand-written code earns its keep.
Run the layers out of order and you either crash (business rules that hit a missing key) or waste cycles (content checks on fields that were never present).
The pattern maps directly to tooling choice. Structural checks are generic and declarative: JSON Schema shines here. Content checks are field-specific but follow a small number of templates -- JSON Schema still covers most cases. Business rules are unique to your domain and often cross-field, so they stay in hand-written Python. Sections 3 to 5 build each layer with the right tool.
With the framework set, section 3 implements all three layers as hand-written Python against the Weather Dashboard. You'll see exactly what each layer looks like in code before section 4 automates the first two.