5. The hybrid approach
Sections 3 and 4 offered two extremes: write every check by hand, or declare every check in JSON Schema. Production systems rarely live at either extreme. They use a schema to automate the mechanical work (structure, types, ranges, required fields) and hand-written Python for cross-field rules that are clearer as code. This page builds the hybrid pipeline, covers when to reach for it, and closes with two details that turn a functional validator into a useful one: good error messages and a realistic view of performance.
Combining schema and manual validation
The hybrid pipeline loads section 4's weather_schema.json and runs it first, then runs the business-rule validator. Each one handles what it is best at, and the order matters for the same reason it mattered in the manual pipeline -- business rules assume fields exist and have the right type, and the schema is what guarantees that.
The script expects weather_schema.json in the same directory; copy it over from section 4 if you're running this in a fresh folder.
import json
from jsonschema import validate, ValidationError
with open("weather_schema.json") as f:
weather_schema = json.load(f)
def validate_weather_business_rules(current_data):
"""Hand-written validator for the layer 3 rules that are clearer in Python."""
temp = current_data.get("temperature_2m")
weather_code = current_data.get("weather_code")
apparent_temp = current_data.get("apparent_temperature")
if temp is not None and weather_code is not None:
if weather_code in [71, 73, 75] and temp > 5:
return False, f"Snow at {temp}°C is unlikely"
if temp is not None and apparent_temp is not None:
if abs(apparent_temp - temp) > 20:
return False, (
f"'Feels like' {apparent_temp}°C too different "
f"from actual {temp}°C"
)
return True, None
def validate_weather_hybrid(data):
"""
Hybrid pipeline: schema for structure and content, manual for business rules.
Returns (is_valid, error_message).
"""
try:
validate(instance=data, schema=weather_schema)
except ValidationError as e:
return False, f"Schema validation failed: {e.message}"
current = data["current"]
valid, error = validate_weather_business_rules(current)
if not valid:
return False, f"Business rule validation failed: {error}"
return True, None
Run it against a response the schema accepts but the business-rule check rejects: a warm temperature paired with a snow weather code. The schema says everything is in range and of the right type; the hand-written rule catches the combination:
from hybrid_validate import validate_weather_hybrid
good_data = {
"current": {
"temperature_2m": 22.5,
"relative_humidity_2m": 65,
"wind_speed_10m": 12.3,
"weather_code": 0,
"apparent_temperature": 21.8,
}
}
# Schema-legal but business-rule-illegal: snow code at 15°C.
snow_at_15 = {
"current": {"temperature_2m": 15.0, "weather_code": 71}
}
print(validate_weather_hybrid(good_data))
print(validate_weather_hybrid(snow_at_15))
$ python test_hybrid.py
(True, None)
(False, 'Business rule validation failed: Snow at 15.0°C is unlikely')
Clear division of labour
The split is the whole point. The schema handles what it handles well: type checking, range validation, required fields, nested structure. The Python function handles what code handles better: cross-field rules, domain constraints, complex conditions. Each piece does work it is suited for. The result is more capable than schema-only and less repetitive than manual. Updating a range constraint is a schema edit; updating a business rule is a Python edit; neither one forces a rewrite of the other.
Notice that the business-rule function is simpler than its section-3 counterpart: no float() calls, no try/except blocks. The schema upstream type-checks temperature_2m, weather_code, and apparent_temperature, so Layer 3 can focus on the cross-field logic without any defensive type conversion.
When to reach for which approach
Three approaches, three reasonable choices, and a decision matrix that falls out of how complex the data is and how often it changes.
| Scenario | Best approach | Rationale |
|---|---|---|
| Single API, simple structure | Manual validation | Schema overhead is not worth it at small scale |
| Multiple APIs with similar structure | JSON Schema | Schemas reuse across APIs and reduce duplication |
| Complex business rules | Hybrid | Schema for structure and content, manual for domain logic |
| Rapidly changing API | JSON Schema | Update schema files without rewriting validator logic |
| Mixed-skill team | JSON Schema | Declarative rules read more clearly than validator code |
| Performance-critical path | Manual validation | Avoid schema traversal overhead in hot paths |
| Building an SDK or library | Hybrid | Users can extend schemas for their own constraints |
Most applications converge on the hybrid pattern eventually. Start with manual validation when a single endpoint has three required fields. Add a schema when repetition starts to hurt or when a second endpoint would duplicate the same type and range checks. Keep hand-written business-rule functions for the cross-field logic that always remains. The progression is natural and each step is justified by actual pain, not speculation.
Writing good error messages
Validation error messages serve two audiences: a developer debugging during integration, and an operations engineer looking at logs at 2am. Good messages do three things: name the rule that failed, show the value that caused the failure, and identify which layer caught it.
Be specific about the rule that failed
Generic messages like "invalid temperature" make the reader do the work of figuring out what "invalid" means. Name the rule and the bound.
# Avoid
return False, "Invalid temperature"
# Prefer
return False, f"Temperature {temp}°C outside valid range (-100 to 60)"
Include the offending value
Error messages that don't show the actual input force the reader to go look at the request logs. Keep the value in the message.
# Avoid
return False, "Unrealistic humidity"
# Prefer
return False, f"Humidity {humidity}% exceeds maximum of 100%"
Identify which layer caught it
The layer prefix you saw in the manual pipeline in section 3 is doing work. It tells the reader where to look: a structural failure is an upstream shape change, a content failure is a value problem, a business rule failure is a domain question.
return False, "Structure validation failed: missing 'current' section"
return False, "Content validation failed: temperature must be numeric"
return False, "Business rule validation failed: snow at 15°C is unlikely"
Performance, honestly
Validation adds overhead. For most API integration code, that overhead is a rounding error compared to the network round-trip. The question is not "does it cost something?" but "does the cost matter?"
Where validation cost actually shows up
Schema validation is slower than hand-written Python. The jsonschema library parses the schema, traverses the instance, and runs generic checks, which costs more than targeted field-specific code. In an API client that makes a request every few seconds, that difference is invisible. In a request handler serving thousands of requests per second, it adds up.
Three things keep schema validation practical even in busier code paths. Compile the schema once with Draft7Validator(schema) and reuse the instance rather than parsing the dict on every call. Validate at boundaries and trust the data downstream, so the same response is validated once, not at every layer. And reach for manual validators in the specific hot paths where the cost shows up in a profiler -- not everywhere, just where measurement says it matters.
The baseline to keep in mind: a typical API request takes 100-500ms round-trip; schema validation on a normal response adds 1-5ms. For normal API integration code, the overhead is insignificant. The "performance-critical path" row in the decision matrix above is reserved for genuinely hot code paths (request-hot inner loops, batch jobs processing millions of records), not for ordinary service code.
Three sections in, validation has been a question of how: by hand, by schema, by combination. Section 6 takes the zoom level up one click and asks where it should live, and what each placement costs.