5. Data processing layer: validation and transformation
The processing layer is the contract between messy and clean. Raw API responses come in (a dict from requests.json(), possibly with missing fields, possibly with sensor errors); validated dataclasses come out (typed, normalised, range-checked). Everything above this layer can stop doing defensive programming because this layer already did it.
What you'll build
In this section you build three files at the project root:
models.py— the dataclassesgeocoding_processor.py— array-shaped responsesweather_processor.py— nested responses
These processors combine two pieces you have already built. safe_get() (covered in Chapter 10) handles missing fields without exceptions, and the three-layer validation pipeline from Chapter 12 (structure → content → business rules) decides whether the data is trustworthy. Each processor applies both in a single pass over the API response.
Normalised data structures
The processors return three trusted shapes: Location, CurrentWeather, and WeatherData. Here's what one of those objects looks like once the processors have done their work; the dataclass definitions follow.
WeatherData is the outer shape; Location and CurrentWeather sit inside it.
WeatherData(
location=Location(
name="London",
country="GB",
state="England",
latitude=51.5074,
longitude=-0.1278,
confidence_score=0.95,
raw_data={...},
),
current=CurrentWeather(
temperature=15.2,
feels_like=14.1,
humidity=78,
pressure=1011.3,
description="Light Rain",
icon="10d",
visibility=10.0,
wind_speed=12.0,
wind_direction=230,
),
timestamp=datetime(2026, 5, 1, 14, 30),
timezone_offset_seconds=3600,
data_quality_score=1.00,
validation_warnings=[],
raw_data={...},
)
Every field is typed, every value is in range, and any nested structure has already been validated. Code that holds a WeatherData never needs to ask "is this a number?" or "did this field arrive?"; those questions were settled at the boundary.
Here are the dataclass definitions. Save the file at the project root as models.py:
"""
Normalized data structures for weather dashboard.
These models represent validated, trustworthy data that upper layers consume.
They hide API-specific response formats behind clean Python interfaces.
"""
from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional, List, Dict, Any
@dataclass
class Location:
"""
Validated location from geocoding API.
From Chapter 10: Normalized structure regardless of API response variations.
From Chapter 12: All fields validated before object creation.
"""
name: str
country: str
state: Optional[str]
latitude: float
longitude: float
# Metadata for debugging and quality assessment
confidence_score: float # 0.0 to 1.0
raw_data: Dict[Any, Any] # Original API response
def display_name(self) -> str:
"""Human-readable location name."""
parts = [self.name]
if self.state:
parts.append(self.state)
parts.append(self.country)
return ", ".join(parts)
@dataclass
class CurrentWeather:
"""
Validated current weather conditions.
From Chapter 12: All fields validated for type and realistic ranges.
"""
temperature: float # Celsius
feels_like: float # Celsius
humidity: int # Percentage (0-100)
pressure: float # hPa
description: str
icon: str
# Optional fields
visibility: Optional[float] = None # kilometers
wind_speed: Optional[float] = None # km/h
wind_direction: Optional[int] = None # degrees
def temperature_fahrenheit(self) -> float:
"""Convert temperature to Fahrenheit."""
return (self.temperature * 9/5) + 32
def feels_like_fahrenheit(self) -> float:
"""Convert feels-like temperature to Fahrenheit."""
return (self.feels_like * 9/5) + 32
@dataclass
class WeatherData:
"""
Complete validated weather information.
Combines location and weather data with quality metadata.
"""
location: Location
current: CurrentWeather
timestamp: datetime
timezone_offset_seconds: int
# Quality metadata
data_quality_score: float # 0.0 to 1.0
validation_warnings: list # Non-fatal issues found during validation
# Original response for debugging
raw_data: Dict[Any, Any]
Geocoding response processor
The geocoding processor takes the API client's array-shaped geocoding payload and turns it into a Location dataclass, or a clear error if it won't validate. Save this at the project root as geocoding_processor.py:
"""
Geocoding response processor.
Integrates:
- Chapter 10: Safe navigation and collection normalization
- Chapter 12: Three-layer validation (structure → content → business rules)
"""
from typing import List, Tuple, Optional, Dict, Any
import requests
# Import Chapter 10 utilities
from json_helpers import normalize_collection, safe_get
# Import Chapter 12 utilities
from validators import ValidationError, validate_structure, validate_range, safe_float
# Import data models
from models import Location
class GeocodingProcessor:
"""
Process geocoding API responses into validated Location objects.
Applies systematic validation:
- Layer 1 (Structure): Response shape and required sections
- Layer 2 (Content): Field types and realistic ranges
- Layer 3 (Business Rules): Cross-field logic and confidence scoring
"""
def __init__(self):
"""Initialize processor with validation rules."""
# Required fields from API response (Chapter 12)
self.required_fields = ['name', 'country', 'lat', 'lon']
# Coordinate ranges for validation (Chapter 12)
self.lat_range = (-90, 90)
self.lon_range = (-180, 180)
def process_response(self, api_response: List[Dict[Any, Any]],
city_name: str) -> List[Location]:
"""
Process geocoding API response into validated Location objects.
Integrates Chapter 10 + Chapter 12 patterns:
1. Normalize collection structure (Chapter 10)
2. Validate each location (Chapter 12, three layers)
3. Sort by confidence (business logic)
Args:
api_response: Raw API response from geocoding service
city_name: Original user input for error context
Returns:
List of validated Location objects, sorted by confidence
Raises:
ValidationError: If no valid locations can be extracted
"""
# Chapter 10: Normalize collection (API returns direct array)
locations_data = normalize_collection(api_response)
if not locations_data:
raise ValidationError(f"No locations found for '{city_name}'")
# Process and validate each location
validated_locations = []
for i, location_data in enumerate(locations_data):
try:
# Apply three-layer validation
validated_location = self._validate_location(location_data, i)
validated_locations.append(validated_location)
except ValidationError as e:
# Log validation failure but continue processing others
print(f"Warning: Skipping invalid location {i}: {e}")
continue
if not validated_locations:
raise ValidationError(
f"No valid locations found for '{city_name}'. "
"All results failed validation."
)
# Business rule: Sort by confidence (highest first)
validated_locations.sort(key=lambda loc: loc.confidence_score, reverse=True)
return validated_locations
def _validate_location(self, location_data: Dict[Any, Any],
index: int) -> Location:
"""
Validate single location with three-layer approach.
From Chapter 12: Structure → Content → Business Rules
"""
# LAYER 1: STRUCTURE VALIDATION
# Check required fields exist (Chapter 12, Section 2)
missing_fields = []
for field in self.required_fields:
if field not in location_data:
missing_fields.append(field)
if missing_fields:
raise ValidationError(f"Missing required fields: {missing_fields}")
# LAYER 2: CONTENT VALIDATION
# Extract and validate coordinates (Chapter 12, Section 2)
try:
latitude = safe_float(location_data['lat'], 'latitude')
longitude = safe_float(location_data['lon'], 'longitude')
except ValidationError as e:
raise ValidationError(f"Invalid coordinates: {e}")
# Validate coordinate ranges (Chapter 12, Section 2)
valid, error = validate_range(latitude, 'latitude',
min_val=self.lat_range[0],
max_val=self.lat_range[1])
if not valid:
raise ValidationError(error)
valid, error = validate_range(longitude, 'longitude',
min_val=self.lon_range[0],
max_val=self.lon_range[1])
if not valid:
raise ValidationError(error)
# Extract and validate text fields (Chapter 10: safe_get)
name = str(safe_get(location_data, 'name', '')).strip()
if not name:
raise ValidationError("Empty location name")
country = str(safe_get(location_data, 'country', '')).strip()
if not country:
raise ValidationError("Empty country")
# Optional state field (Chapter 10: defensive extraction)
state = safe_get(location_data, 'state')
if state:
state = str(state).strip()
if not state:
state = None
# LAYER 3: BUSINESS RULES
# Calculate confidence score based on data quality and position
confidence_score = self._calculate_confidence(location_data, index)
# Create validated Location object
return Location(
name=name,
country=country,
state=state,
latitude=latitude,
longitude=longitude,
confidence_score=confidence_score,
raw_data=location_data
)
def _calculate_confidence(self, location_data: Dict[Any, Any],
index: int) -> float:
"""
Calculate confidence score for location match.
Business rules (Chapter 12, Layer 3):
- Higher-ranked results get higher confidence
- Presence of state increases confidence
- Generic names decrease confidence
The base ceiling is 0.9 so the +0.1 state bonus survives clamping
for the top-ranked result; otherwise index 0 with state would compute
as 1.1 and clamp back to 1.0, silently ignoring the bonus.
"""
base_score = 0.9
# Reduce confidence for lower-ranked results
position_penalty = index * 0.1
base_score -= position_penalty
# Increase confidence if state is provided (more specific)
if location_data.get('state'):
base_score += 0.1
# Reduce confidence for generic names
name = location_data.get('name', '').lower()
generic_names = ['city', 'town', 'village', 'municipality']
if name in generic_names:
base_score -= 0.2
# Ensure score stays in valid range [0.0, 1.0]
return max(0.0, min(1.0, base_score))
The geocoding processor adds one preliminary step: normalize_collection() (from json_helpers.py) turns the array-shaped response into a plain list. It then runs the three-layer pipeline per location: structure (required fields present), content (coordinates parsed with safe_float() and range-checked), business rules (a confidence score so downstream code can sort and prioritise results). Empty lists and all-invalid responses raise ValidationError; per-location failures inside the loop are skipped with a console warning so a mixed payload still yields the ones that validate. The orchestrator built in §6 catches the ValidationError and surfaces it as the not_found category.
Weather response processor
Same shape, different payload. The weather endpoint returns a nested object instead of an array of matches, but the pipeline is identical: structural check, content check, range check on every numeric field, then a WeatherData dataclass at the end. Save this at the project root as weather_processor.py:
"""
Weather response processor.
Integrates:
- Chapter 10: Safe navigation for nested weather data
- Chapter 12: Three-layer validation with realistic ranges
"""
from datetime import datetime
from typing import List, Dict, Any
# Import Chapter 10 utilities
from json_helpers import safe_get
# Import Chapter 12 utilities
from validators import ValidationError, validate_structure, validate_range
from validators import safe_float, safe_int
# Import data models
from models import Location, CurrentWeather, WeatherData
class WeatherProcessor:
"""
Process weather API responses into validated WeatherData objects.
Applies Chapter 12's three-layer validation:
- Layer 1: Response structure
- Layer 2: Field types and realistic ranges
- Layer 3: Cross-field business rules
"""
def __init__(self):
"""Initialize processor with validation rules."""
# Required top-level sections (Chapter 12, Layer 1)
self.required_sections = ['main', 'weather', 'dt', 'timezone']
# Required fields within 'main' section (Chapter 12, Layer 2)
self.main_required = ['temp', 'feels_like', 'humidity', 'pressure']
# Realistic ranges for validation (Chapter 12, Layer 2)
self.temp_range = (-100, 60) # Celsius
self.humidity_range = (0, 100) # Percentage
self.pressure_range = (800, 1200) # hPa
def process_response(self, api_response: Dict[Any, Any],
location: Location) -> WeatherData:
"""
Process weather API response into validated WeatherData object.
Integrates Chapter 10 + Chapter 12:
1. Validate structure (Chapter 12, Layer 1)
2. Extract with safe navigation (Chapter 10)
3. Validate content (Chapter 12, Layer 2)
4. Apply business rules (Chapter 12, Layer 3)
Args:
api_response: Raw weather API response
location: Validated location this weather corresponds to
Returns:
Validated WeatherData object
Raises:
ValidationError: If response fails validation
"""
warnings = []
# LAYER 1: STRUCTURE VALIDATION (Chapter 12)
valid, error = validate_structure(api_response, self.required_sections)
if not valid:
raise ValidationError(f"Invalid weather response structure: {error}")
# Extract main weather data section (Chapter 10: safe access)
main_data = api_response['main']
if not isinstance(main_data, dict):
raise ValidationError("'main' section must be a dictionary")
weather_list = api_response['weather']
if not isinstance(weather_list, list):
raise ValidationError("'weather' section must be a list")
# Check required fields in main section
missing_fields = [field for field in self.main_required
if field not in main_data]
if missing_fields:
raise ValidationError(f"Missing main weather fields: {missing_fields}")
# LAYER 2: CONTENT VALIDATION (Chapter 12)
# Extract and validate temperature
try:
temperature = safe_float(main_data['temp'], 'temperature')
except ValidationError as e:
raise ValidationError(f"Invalid temperature: {e}")
valid, error = validate_range(temperature, 'temperature',
min_val=self.temp_range[0],
max_val=self.temp_range[1])
if not valid:
raise ValidationError(error)
# Extract and validate feels-like temperature
try:
feels_like = safe_float(main_data['feels_like'], 'feels_like')
except ValidationError as e:
raise ValidationError(f"Invalid feels_like: {e}")
# Validate humidity
try:
humidity = safe_int(main_data['humidity'], 'humidity')
except ValidationError as e:
raise ValidationError(f"Invalid humidity: {e}")
valid, error = validate_range(humidity, 'humidity',
min_val=self.humidity_range[0],
max_val=self.humidity_range[1])
if not valid:
raise ValidationError(error)
# Validate pressure
try:
pressure = safe_float(main_data['pressure'], 'pressure')
except ValidationError as e:
raise ValidationError(f"Invalid pressure: {e}")
valid, error = validate_range(pressure, 'pressure',
min_val=self.pressure_range[0],
max_val=self.pressure_range[1])
if not valid:
raise ValidationError(error)
# Extract weather description (Chapter 10: safe_get with defaults)
if not weather_list:
raise ValidationError("No weather conditions data")
primary_weather = weather_list[0]
description = str(safe_get(primary_weather, 'description', 'Unknown')).title()
icon = str(safe_get(primary_weather, 'icon', ''))
# Extract optional fields (Chapter 10: defensive extraction)
visibility = safe_get(api_response, 'visibility')
if visibility is not None:
try:
visibility = float(visibility) / 1000 # Convert meters to km
except (ValueError, TypeError):
visibility = None
warnings.append("Could not parse visibility data")
wind_speed = safe_get(api_response, 'wind.speed')
if wind_speed is not None:
try:
wind_speed = float(wind_speed)
except (ValueError, TypeError):
wind_speed = None
warnings.append("Could not parse wind speed")
wind_direction = safe_get(api_response, 'wind.deg')
if wind_direction is not None:
try:
wind_direction = int(wind_direction)
except (ValueError, TypeError):
wind_direction = None
warnings.append("Could not parse wind direction")
# Extract timestamp and timezone offset (Chapter 10: safe extraction)
try:
timestamp = datetime.fromtimestamp(safe_int(api_response['dt'], 'timestamp'))
except (ValueError, TypeError, OSError) as e:
raise ValidationError(f"Invalid timestamp: {e}")
try:
timezone_offset_seconds = safe_int(api_response['timezone'], 'timezone')
except ValidationError as e:
raise ValidationError(f"Invalid timezone offset: {e}")
# LAYER 3: BUSINESS RULES (Chapter 12)
# Validate feels-like vs actual temperature relationship
temp_diff = abs(feels_like - temperature)
if temp_diff > 20:
warnings.append(
f"Large difference between temperature ({temperature}°C) "
f"and feels-like ({feels_like}°C)"
)
# Create validated CurrentWeather object
current_weather = CurrentWeather(
temperature=temperature,
feels_like=feels_like,
humidity=humidity,
pressure=pressure,
description=description,
icon=icon,
visibility=visibility,
wind_speed=wind_speed,
wind_direction=wind_direction
)
# Calculate data quality score
quality_score = self._calculate_quality_score(api_response, warnings)
# Create complete WeatherData object
return WeatherData(
location=location,
current=current_weather,
timestamp=timestamp,
timezone_offset_seconds=timezone_offset_seconds,
data_quality_score=quality_score,
validation_warnings=warnings,
raw_data=api_response
)
def _calculate_quality_score(self, api_response: Dict[Any, Any],
warnings: List[str]) -> float:
"""
Calculate data quality score.
Business logic (Chapter 12, Layer 3):
- Reduce score for missing optional fields
- Reduce score for validation warnings
- Increase score for recent data
"""
score = 1.0
# Reduce score for missing optional fields
if safe_get(api_response, 'visibility') is None:
score -= 0.1
if safe_get(api_response, 'wind.speed') is None:
score -= 0.05
# Reduce score for validation warnings
score -= len(warnings) * 0.1
# Increase score for recent data (less than 5 minutes old)
try:
data_age = datetime.now().timestamp() - api_response['dt']
if data_age < 300: # Less than 5 minutes
score += 0.1
except (KeyError, TypeError):
pass
# Ensure score stays in valid range
return max(0.0, min(1.0, score))
The cross-field business rule is the new piece this side. The geocoding processor's Layer 3 was a confidence score; the weather processor's Layer 3 also flags cross-field oddities (a feels-like value far from the actual temperature, an implausible wind speed) and rolls the result into a data_quality_score. Hard failures (missing sections, out-of-range temperature) still raise ValidationError; soft issues land in validation_warnings and lower the score without rejecting the response. The orchestrator reads both: a raised ValidationError becomes PARTIAL_SUCCESS, while a lowered score surfaces as a warning on an otherwise successful result.
Verifying the processing layer
Four test cases cover the typical shapes the processors will see: valid geocoding, invalid coordinates, valid weather, and unrealistic temperature. Save this at the project root as test_processing_layer.py and run it with python test_processing_layer.py:
"""Test the processing layer with real API responses."""
from datetime import datetime
from geocoding_processor import GeocodingProcessor
from weather_processor import WeatherProcessor
from validators import ValidationError
# Test 1: Valid geocoding response
print("=== Testing Processing Layer ===\n")
print("Test 1: Valid geocoding response")
geocoding_response = [
{
"name": "London",
"country": "GB",
"state": "England",
"lat": 51.5074,
"lon": -0.1278
}
]
processor = GeocodingProcessor()
locations = processor.process_response(geocoding_response, "London")
print(f"Locations found: {len(locations)}")
print(f"First location: {locations[0].display_name()}")
print(f"Coordinates: ({locations[0].latitude}, {locations[0].longitude})")
print(f"Confidence: {locations[0].confidence_score:.2f}")
print()
# Test 2: Invalid coordinate range
print("Test 2: Invalid coordinate range")
invalid_response = [
{
"name": "Invalid",
"country": "XX",
"lat": 999, # Out of range
"lon": -0.1278
}
]
try:
locations = processor.process_response(invalid_response, "Invalid")
print("ERROR: Should have raised ValidationError")
except ValidationError as e:
print(f"[OK] Correctly caught invalid coordinates: {e}")
print()
# Test 3: Valid weather response
print("Test 3: Valid weather response")
weather_response = {
"main": {
"temp": 15.5,
"feels_like": 14.2,
"humidity": 72,
"pressure": 1013.2
},
"weather": [
{
"description": "light rain",
"icon": "10d"
}
],
"dt": int(datetime.now().timestamp()),
"timezone": 3600,
"wind": {
"speed": 5.2,
"deg": 230
}
}
weather_processor = WeatherProcessor()
location = locations[0] # Use London from Test 1
weather_data = weather_processor.process_response(weather_response, location)
print(f"Temperature: {weather_data.current.temperature}°C")
print(f"Feels like: {weather_data.current.feels_like}°C")
print(f"Conditions: {weather_data.current.description}")
print(f"Quality score: {weather_data.data_quality_score:.2f}")
print(f"Warnings: {len(weather_data.validation_warnings)}")
print()
# Test 4: Temperature out of range
print("Test 4: Temperature out of realistic range")
invalid_weather = {
"main": {
"temp": 150, # Unrealistic
"feels_like": 145,
"humidity": 72,
"pressure": 1013.2
},
"weather": [{"description": "error", "icon": ""}],
"dt": int(datetime.now().timestamp()),
"timezone": 0
}
try:
weather_data = weather_processor.process_response(invalid_weather, location)
print("ERROR: Should have raised ValidationError")
except ValidationError as e:
print(f"[OK] Correctly caught unrealistic temperature: {e}")
print()
=== Testing Processing Layer ===
Test 1: Valid geocoding response
Locations found: 1
First location: London, England, GB
Coordinates: (51.5074, -0.1278)
Confidence: 1.00
Test 2: Invalid coordinate range
[OK] Correctly caught invalid coordinates: latitude 999.0 above maximum 90
Test 3: Valid weather response
Temperature: 15.5°C
Feels like: 14.2°C
Conditions: Light Rain
Quality score: 1.00
Warnings: 0
Test 4: Temperature out of realistic range
[OK] Correctly caught unrealistic temperature: temperature 150.0 above maximum 60
All four cases land where they should. Valid responses normalise into typed dataclasses; invalid coordinates and out-of-range temperatures raise ValidationError before any downstream code touches them.
If your output diverges, the most likely failure points are the validators. If Tests 2 or 4 print ERROR: Should have raised ValidationError, the range checks in validators.py aren't firing; go back to §3 and re-run the import-check on validators.py to confirm validate_range and safe_float are loaded. If Tests 1 or 3 raise ValidationError on valid data, the required-fields list in the processor doesn't match the test payload: check both against the dataclass field names.
Section 6 builds the business logic layer on top: the orchestrator calls the API client and the processors in sequence and trusts whatever comes back. Every WeatherData on this side of the boundary has already been validated; the orchestrator's job is sequencing and outcome judgement, not error checking.