5. Data processing layer: validation and transformation

The processing layer is the contract between messy and clean. Raw API responses come in (a dict from requests.json(), possibly with missing fields, possibly with sensor errors); validated dataclasses come out (typed, normalised, range-checked). Everything above this layer can stop doing defensive programming because this layer already did it.

Diagram showing raw API data flowing through a data processing layer that validates and transforms it into trusted, typed dataclasses, with a fail-fast boundary preventing invalid data from passing through. — **Fail fast at the boundary.** Raw API responses arrive on one side; validated dataclasses leave on the other. Bad data never crosses, so nothing downstream needs to defend against it.

What you'll build

In this section you build three files at the project root:

models.py — the dataclasses
geocoding_processor.py — array-shaped responses
weather_processor.py — nested responses

These processors combine two pieces you have already built. safe_get() (covered in Chapter 10) handles missing fields without exceptions, and the three-layer validation pipeline from Chapter 12 (structure → content → business rules) decides whether the data is trustworthy. Each processor applies both in a single pass over the API response.

Normalised data structures

The processors return three trusted shapes: Location, CurrentWeather, and WeatherData. Here's what one of those objects looks like once the processors have done their work; the dataclass definitions follow.

WeatherData is the outer shape; Location and CurrentWeather sit inside it.

WeatherData(
    location=Location(
        name="London",
        country="GB",
        state="England",
        latitude=51.5074,
        longitude=-0.1278,
        confidence_score=0.95,
        raw_data={...},
    ),
    current=CurrentWeather(
        temperature=15.2,
        feels_like=14.1,
        humidity=78,
        pressure=1011.3,
        description="Light Rain",
        icon="10d",
        visibility=10.0,
        wind_speed=12.0,
        wind_direction=230,
    ),
    timestamp=datetime(2026, 5, 1, 14, 30),
    timezone_offset_seconds=3600,
    data_quality_score=1.00,
    validation_warnings=[],
    raw_data={...},
)

Every field is typed, every value is in range, and any nested structure has already been validated. Code that holds a WeatherData never needs to ask "is this a number?" or "did this field arrive?"; those questions were settled at the boundary.

Here are the dataclass definitions. Save the file at the project root as models.py:

models.py

"""
Normalized data structures for weather dashboard.

These models represent validated, trustworthy data that upper layers consume.
They hide API-specific response formats behind clean Python interfaces.
"""
from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional, List, Dict, Any

@dataclass
class Location:
    """
    Validated location from geocoding API.
    
    From Chapter 10: Normalized structure regardless of API response variations.
    From Chapter 12: All fields validated before object creation.
    """
    name: str
    country: str
    state: Optional[str]
    latitude: float
    longitude: float
    
    # Metadata for debugging and quality assessment
    confidence_score: float # 0.0 to 1.0
    raw_data: Dict[Any, Any] # Original API response
    
    def display_name(self) -> str:
        """Human-readable location name."""
        parts = [self.name]
        if self.state:
            parts.append(self.state)
        parts.append(self.country)
        return ", ".join(parts)

@dataclass
class CurrentWeather:
    """
    Validated current weather conditions.
    
    From Chapter 12: All fields validated for type and realistic ranges.
    """
    temperature: float # Celsius
    feels_like: float # Celsius
    humidity: int # Percentage (0-100)
    pressure: float # hPa
    description: str
    icon: str
    
    # Optional fields
    visibility: Optional[float] = None # kilometers
    wind_speed: Optional[float] = None # km/h
    wind_direction: Optional[int] = None # degrees
    
    def temperature_fahrenheit(self) -> float:
        """Convert temperature to Fahrenheit."""
        return (self.temperature * 9/5) + 32
    
    def feels_like_fahrenheit(self) -> float:
        """Convert feels-like temperature to Fahrenheit."""
        return (self.feels_like * 9/5) + 32

@dataclass
class WeatherData:
    """
    Complete validated weather information.
    
    Combines location and weather data with quality metadata.
    """
    location: Location
    current: CurrentWeather
    timestamp: datetime
    timezone_offset_seconds: int
    
    # Quality metadata
    data_quality_score: float # 0.0 to 1.0
    validation_warnings: list # Non-fatal issues found during validation
    
    # Original response for debugging
    raw_data: Dict[Any, Any]

Geocoding response processor

The geocoding processor takes the API client's array-shaped geocoding payload and turns it into a Location dataclass, or a clear error if it won't validate. Save this at the project root as geocoding_processor.py:

Three-panel horizontal data flow diagram. On the left, an API Client panel (globe icon, captioned 'Makes HTTP request to the Geocoding API'). An orange arrow leads to the right, labelled APIResult (raw API response) with a sub-label 'list (from requests.json())'. The middle panel is the Geocoding Processor (filename geocoding_processor.py, funnel icon, captioned 'Raw payload in, validated Location dataclass out'). A black arrow leads further right, labelled List[Location] with a sub-label 'validated dataclasses'. The right panel is the Orchestrator (Orchestrator.workflow, person icon, captioned 'Consumes trusted List[Location]'). Below the diagram, a grey dashed bracket spans from API Client to the processor labelled 'Raw data crosses the boundary'; a green dashed bracket spans from the processor to Orchestrator labelled 'Only validated data crosses', visualising the validation boundary the processor enforces.

geocoding_processor.py

"""
Geocoding response processor.

Integrates:
- Chapter 10: Safe navigation and collection normalization
- Chapter 12: Three-layer validation (structure → content → business rules)
"""
from typing import List, Tuple, Optional, Dict, Any
import requests

# Import Chapter 10 utilities
from json_helpers import normalize_collection, safe_get

# Import Chapter 12 utilities
from validators import ValidationError, validate_structure, validate_range, safe_float

# Import data models
from models import Location

class GeocodingProcessor:
    """
    Process geocoding API responses into validated Location objects.
    
    Applies systematic validation:
    - Layer 1 (Structure): Response shape and required sections
    - Layer 2 (Content): Field types and realistic ranges
    - Layer 3 (Business Rules): Cross-field logic and confidence scoring
    """
    
    def __init__(self):
        """Initialize processor with validation rules."""
        # Required fields from API response (Chapter 12)
        self.required_fields = ['name', 'country', 'lat', 'lon']
        
        # Coordinate ranges for validation (Chapter 12)
        self.lat_range = (-90, 90)
        self.lon_range = (-180, 180)
    
    def process_response(self, api_response: List[Dict[Any, Any]], 
                        city_name: str) -> List[Location]:
        """
        Process geocoding API response into validated Location objects.
        
        Integrates Chapter 10 + Chapter 12 patterns:
        1. Normalize collection structure (Chapter 10)
        2. Validate each location (Chapter 12, three layers)
        3. Sort by confidence (business logic)
        
        Args:
            api_response: Raw API response from geocoding service
            city_name: Original user input for error context
            
        Returns:
            List of validated Location objects, sorted by confidence
            
        Raises:
            ValidationError: If no valid locations can be extracted
        """
        # Chapter 10: Normalize collection (API returns direct array)
        locations_data = normalize_collection(api_response)
        
        if not locations_data:
            raise ValidationError(f"No locations found for '{city_name}'")
        
        # Process and validate each location
        validated_locations = []
        
        for i, location_data in enumerate(locations_data):
            try:
                # Apply three-layer validation
                validated_location = self._validate_location(location_data, i)
                validated_locations.append(validated_location)
                
            except ValidationError as e:
                # Log validation failure but continue processing others
                print(f"Warning: Skipping invalid location {i}: {e}")
                continue
        
        if not validated_locations:
            raise ValidationError(
                f"No valid locations found for '{city_name}'. "
                "All results failed validation."
            )
        
        # Business rule: Sort by confidence (highest first)
        validated_locations.sort(key=lambda loc: loc.confidence_score, reverse=True)
        
        return validated_locations
    
    def _validate_location(self, location_data: Dict[Any, Any], 
                          index: int) -> Location:
        """
        Validate single location with three-layer approach.
        
        From Chapter 12: Structure → Content → Business Rules
        """
        # LAYER 1: STRUCTURE VALIDATION
        # Check required fields exist (Chapter 12, Section 2)
        missing_fields = []
        for field in self.required_fields:
            if field not in location_data:
                missing_fields.append(field)
        
        if missing_fields:
            raise ValidationError(f"Missing required fields: {missing_fields}")
        
        # LAYER 2: CONTENT VALIDATION
        # Extract and validate coordinates (Chapter 12, Section 2)
        try:
            latitude = safe_float(location_data['lat'], 'latitude')
            longitude = safe_float(location_data['lon'], 'longitude')
        except ValidationError as e:
            raise ValidationError(f"Invalid coordinates: {e}")
        
        # Validate coordinate ranges (Chapter 12, Section 2)
        valid, error = validate_range(latitude, 'latitude', 
                                     min_val=self.lat_range[0], 
                                     max_val=self.lat_range[1])
        if not valid:
            raise ValidationError(error)
        
        valid, error = validate_range(longitude, 'longitude',
                                     min_val=self.lon_range[0],
                                     max_val=self.lon_range[1])
        if not valid:
            raise ValidationError(error)
        
        # Extract and validate text fields (Chapter 10: safe_get)
        name = str(safe_get(location_data, 'name', '')).strip()
        if not name:
            raise ValidationError("Empty location name")
        
        country = str(safe_get(location_data, 'country', '')).strip()
        if not country:
            raise ValidationError("Empty country")
        
        # Optional state field (Chapter 10: defensive extraction)
        state = safe_get(location_data, 'state')
        if state:
            state = str(state).strip()
            if not state:
                state = None
        
        # LAYER 3: BUSINESS RULES
        # Calculate confidence score based on data quality and position
        confidence_score = self._calculate_confidence(location_data, index)
        
        # Create validated Location object
        return Location(
            name=name,
            country=country,
            state=state,
            latitude=latitude,
            longitude=longitude,
            confidence_score=confidence_score,
            raw_data=location_data
        )
    
    def _calculate_confidence(self, location_data: Dict[Any, Any],
                            index: int) -> float:
        """
        Calculate confidence score for location match.

        Business rules (Chapter 12, Layer 3):
        - Higher-ranked results get higher confidence
        - Presence of state increases confidence
        - Generic names decrease confidence

        The base ceiling is 0.9 so the +0.1 state bonus survives clamping
        for the top-ranked result; otherwise index 0 with state would compute
        as 1.1 and clamp back to 1.0, silently ignoring the bonus.
        """
        base_score = 0.9

        # Reduce confidence for lower-ranked results
        position_penalty = index * 0.1
        base_score -= position_penalty

        # Increase confidence if state is provided (more specific)
        if location_data.get('state'):
            base_score += 0.1

        # Reduce confidence for generic names
        name = location_data.get('name', '').lower()
        generic_names = ['city', 'town', 'village', 'municipality']
        if name in generic_names:
            base_score -= 0.2

        # Ensure score stays in valid range [0.0, 1.0]
        return max(0.0, min(1.0, base_score))

The geocoding processor adds one preliminary step: normalize_collection() (from json_helpers.py) turns the array-shaped response into a plain list. It then runs the three-layer pipeline per location: structure (required fields present), content (coordinates parsed with safe_float() and range-checked), business rules (a confidence score so downstream code can sort and prioritise results). Empty lists and all-invalid responses raise ValidationError; per-location failures inside the loop are skipped with a console warning so a mixed payload still yields the ones that validate. The orchestrator built in §6 catches the ValidationError and surfaces it as the not_found category.

Weather response processor

Same shape, different payload. The weather endpoint returns a nested object instead of an array of matches, but the pipeline is identical: structural check, content check, range check on every numeric field, then a WeatherData dataclass at the end. Save this at the project root as weather_processor.py:

Three-panel horizontal data flow diagram. On the left, an API Client panel (globe icon, captioned 'Makes HTTP request to the Weather API'). An orange arrow leads to the right, labelled APIResult (raw API response) with a sub-label 'dict (from requests.json())'. The middle panel is the Weather Processor (filename weather_processor.py, funnel icon, captioned 'Raw payload in, validated WeatherData out'). A black arrow leads further right, labelled WeatherData with a sub-label 'validated dataclass'. The right panel is the Orchestrator (Orchestrator.workflow, person icon, captioned 'Consumes trusted WeatherData'). Below the diagram, a grey dashed bracket spans from API Client to the processor labelled 'Raw data crosses the boundary'; a green dashed bracket spans from the processor to Orchestrator labelled 'Only validated data crosses', visualising the validation boundary the processor enforces.

weather_processor.py

"""
Weather response processor.

Integrates:
- Chapter 10: Safe navigation for nested weather data
- Chapter 12: Three-layer validation with realistic ranges
"""
from datetime import datetime
from typing import List, Dict, Any

# Import Chapter 10 utilities
from json_helpers import safe_get

# Import Chapter 12 utilities
from validators import ValidationError, validate_structure, validate_range
from validators import safe_float, safe_int

# Import data models
from models import Location, CurrentWeather, WeatherData

class WeatherProcessor:
    """
    Process weather API responses into validated WeatherData objects.
    
    Applies Chapter 12's three-layer validation:
    - Layer 1: Response structure
    - Layer 2: Field types and realistic ranges
    - Layer 3: Cross-field business rules
    """
    
    def __init__(self):
        """Initialize processor with validation rules."""
        # Required top-level sections (Chapter 12, Layer 1)
        self.required_sections = ['main', 'weather', 'dt', 'timezone']
        
        # Required fields within 'main' section (Chapter 12, Layer 2)
        self.main_required = ['temp', 'feels_like', 'humidity', 'pressure']
        
        # Realistic ranges for validation (Chapter 12, Layer 2)
        self.temp_range = (-100, 60) # Celsius
        self.humidity_range = (0, 100) # Percentage
        self.pressure_range = (800, 1200) # hPa
    
    def process_response(self, api_response: Dict[Any, Any], 
                        location: Location) -> WeatherData:
        """
        Process weather API response into validated WeatherData object.
        
        Integrates Chapter 10 + Chapter 12:
        1. Validate structure (Chapter 12, Layer 1)
        2. Extract with safe navigation (Chapter 10)
        3. Validate content (Chapter 12, Layer 2)
        4. Apply business rules (Chapter 12, Layer 3)
        
        Args:
            api_response: Raw weather API response
            location: Validated location this weather corresponds to
            
        Returns:
            Validated WeatherData object
            
        Raises:
            ValidationError: If response fails validation
        """
        warnings = []
        
        # LAYER 1: STRUCTURE VALIDATION (Chapter 12)
        valid, error = validate_structure(api_response, self.required_sections)
        if not valid:
            raise ValidationError(f"Invalid weather response structure: {error}")
        
        # Extract main weather data section (Chapter 10: safe access)
        main_data = api_response['main']
        if not isinstance(main_data, dict):
            raise ValidationError("'main' section must be a dictionary")

        weather_list = api_response['weather']
        if not isinstance(weather_list, list):
            raise ValidationError("'weather' section must be a list")
        
        # Check required fields in main section
        missing_fields = [field for field in self.main_required 
                         if field not in main_data]
        if missing_fields:
            raise ValidationError(f"Missing main weather fields: {missing_fields}")
        
        # LAYER 2: CONTENT VALIDATION (Chapter 12)
        # Extract and validate temperature
        try:
            temperature = safe_float(main_data['temp'], 'temperature')
        except ValidationError as e:
            raise ValidationError(f"Invalid temperature: {e}")
        
        valid, error = validate_range(temperature, 'temperature',
                                     min_val=self.temp_range[0],
                                     max_val=self.temp_range[1])
        if not valid:
            raise ValidationError(error)
        
        # Extract and validate feels-like temperature
        try:
            feels_like = safe_float(main_data['feels_like'], 'feels_like')
        except ValidationError as e:
            raise ValidationError(f"Invalid feels_like: {e}")
        
        # Validate humidity
        try:
            humidity = safe_int(main_data['humidity'], 'humidity')
        except ValidationError as e:
            raise ValidationError(f"Invalid humidity: {e}")
        
        valid, error = validate_range(humidity, 'humidity',
                                     min_val=self.humidity_range[0],
                                     max_val=self.humidity_range[1])
        if not valid:
            raise ValidationError(error)
        
        # Validate pressure
        try:
            pressure = safe_float(main_data['pressure'], 'pressure')
        except ValidationError as e:
            raise ValidationError(f"Invalid pressure: {e}")
        
        valid, error = validate_range(pressure, 'pressure',
                                     min_val=self.pressure_range[0],
                                     max_val=self.pressure_range[1])
        if not valid:
            raise ValidationError(error)
        
        # Extract weather description (Chapter 10: safe_get with defaults)
        if not weather_list:
            raise ValidationError("No weather conditions data")
        
        primary_weather = weather_list[0]
        description = str(safe_get(primary_weather, 'description', 'Unknown')).title()
        icon = str(safe_get(primary_weather, 'icon', ''))
        
        # Extract optional fields (Chapter 10: defensive extraction)
        visibility = safe_get(api_response, 'visibility')
        if visibility is not None:
            try:
                visibility = float(visibility) / 1000 # Convert meters to km
            except (ValueError, TypeError):
                visibility = None
                warnings.append("Could not parse visibility data")
        
        wind_speed = safe_get(api_response, 'wind.speed')
        if wind_speed is not None:
            try:
                wind_speed = float(wind_speed)
            except (ValueError, TypeError):
                wind_speed = None
                warnings.append("Could not parse wind speed")
        
        wind_direction = safe_get(api_response, 'wind.deg')
        if wind_direction is not None:
            try:
                wind_direction = int(wind_direction)
            except (ValueError, TypeError):
                wind_direction = None
                warnings.append("Could not parse wind direction")
        
        # Extract timestamp and timezone offset (Chapter 10: safe extraction)
        try:
            timestamp = datetime.fromtimestamp(safe_int(api_response['dt'], 'timestamp'))
        except (ValueError, TypeError, OSError) as e:
            raise ValidationError(f"Invalid timestamp: {e}")
        
        try:
            timezone_offset_seconds = safe_int(api_response['timezone'], 'timezone')
        except ValidationError as e:
            raise ValidationError(f"Invalid timezone offset: {e}")
        
        # LAYER 3: BUSINESS RULES (Chapter 12)
        # Validate feels-like vs actual temperature relationship
        temp_diff = abs(feels_like - temperature)
        if temp_diff > 20:
            warnings.append(
                f"Large difference between temperature ({temperature}°C) "
                f"and feels-like ({feels_like}°C)"
            )
        
        # Create validated CurrentWeather object
        current_weather = CurrentWeather(
            temperature=temperature,
            feels_like=feels_like,
            humidity=humidity,
            pressure=pressure,
            description=description,
            icon=icon,
            visibility=visibility,
            wind_speed=wind_speed,
            wind_direction=wind_direction
        )
        
        # Calculate data quality score
        quality_score = self._calculate_quality_score(api_response, warnings)
        
        # Create complete WeatherData object
        return WeatherData(
            location=location,
            current=current_weather,
            timestamp=timestamp,
            timezone_offset_seconds=timezone_offset_seconds,
            data_quality_score=quality_score,
            validation_warnings=warnings,
            raw_data=api_response
        )
    
    def _calculate_quality_score(self, api_response: Dict[Any, Any],
                                warnings: List[str]) -> float:
        """
        Calculate data quality score.
        
        Business logic (Chapter 12, Layer 3):
        - Reduce score for missing optional fields
        - Reduce score for validation warnings
        - Increase score for recent data
        """
        score = 1.0
        
        # Reduce score for missing optional fields
        if safe_get(api_response, 'visibility') is None:
            score -= 0.1
        
        if safe_get(api_response, 'wind.speed') is None:
            score -= 0.05
        
        # Reduce score for validation warnings
        score -= len(warnings) * 0.1
        
        # Increase score for recent data (less than 5 minutes old)
        try:
            data_age = datetime.now().timestamp() - api_response['dt']
            if data_age < 300: # Less than 5 minutes
                score += 0.1
        except (KeyError, TypeError):
            pass
        
        # Ensure score stays in valid range
        return max(0.0, min(1.0, score))

The cross-field business rule is the new piece this side. The geocoding processor's Layer 3 was a confidence score; the weather processor's Layer 3 also flags cross-field oddities (a feels-like value far from the actual temperature, an implausible wind speed) and rolls the result into a data_quality_score. Hard failures (missing sections, out-of-range temperature) still raise ValidationError; soft issues land in validation_warnings and lower the score without rejecting the response. The orchestrator reads both: a raised ValidationError becomes PARTIAL_SUCCESS, while a lowered score surfaces as a warning on an otherwise successful result.

Verifying the processing layer

Four test cases cover the typical shapes the processors will see: valid geocoding, invalid coordinates, valid weather, and unrealistic temperature. Save this at the project root as test_processing_layer.py and run it with python test_processing_layer.py:

test_processing_layer.py

"""Test the processing layer with real API responses."""
from datetime import datetime

from geocoding_processor import GeocodingProcessor
from weather_processor import WeatherProcessor
from validators import ValidationError

# Test 1: Valid geocoding response
print("=== Testing Processing Layer ===\n")
print("Test 1: Valid geocoding response")

geocoding_response = [
    {
        "name": "London",
        "country": "GB",
        "state": "England",
        "lat": 51.5074,
        "lon": -0.1278
    }
]

processor = GeocodingProcessor()
locations = processor.process_response(geocoding_response, "London")

print(f"Locations found: {len(locations)}")
print(f"First location: {locations[0].display_name()}")
print(f"Coordinates: ({locations[0].latitude}, {locations[0].longitude})")
print(f"Confidence: {locations[0].confidence_score:.2f}")
print()

# Test 2: Invalid coordinate range
print("Test 2: Invalid coordinate range")

invalid_response = [
    {
        "name": "Invalid",
        "country": "XX",
        "lat": 999, # Out of range
        "lon": -0.1278
    }
]

try:
    locations = processor.process_response(invalid_response, "Invalid")
    print("ERROR: Should have raised ValidationError")
except ValidationError as e:
    print(f"[OK] Correctly caught invalid coordinates: {e}")
print()

# Test 3: Valid weather response
print("Test 3: Valid weather response")

weather_response = {
    "main": {
        "temp": 15.5,
        "feels_like": 14.2,
        "humidity": 72,
        "pressure": 1013.2
    },
    "weather": [
        {
            "description": "light rain",
            "icon": "10d"
        }
    ],
    "dt": int(datetime.now().timestamp()),
    "timezone": 3600,
    "wind": {
        "speed": 5.2,
        "deg": 230
    }
}

weather_processor = WeatherProcessor()
location = locations[0] # Use London from Test 1

weather_data = weather_processor.process_response(weather_response, location)

print(f"Temperature: {weather_data.current.temperature}°C")
print(f"Feels like: {weather_data.current.feels_like}°C")
print(f"Conditions: {weather_data.current.description}")
print(f"Quality score: {weather_data.data_quality_score:.2f}")
print(f"Warnings: {len(weather_data.validation_warnings)}")
print()

# Test 4: Temperature out of range
print("Test 4: Temperature out of realistic range")

invalid_weather = {
    "main": {
        "temp": 150, # Unrealistic
        "feels_like": 145,
        "humidity": 72,
        "pressure": 1013.2
    },
    "weather": [{"description": "error", "icon": ""}],
    "dt": int(datetime.now().timestamp()),
    "timezone": 0
}

try:
    weather_data = weather_processor.process_response(invalid_weather, location)
    print("ERROR: Should have raised ValidationError")
except ValidationError as e:
    print(f"[OK] Correctly caught unrealistic temperature: {e}")
print()

Terminal

=== Testing Processing Layer ===

Test 1: Valid geocoding response
Locations found: 1
First location: London, England, GB
Coordinates: (51.5074, -0.1278)
Confidence: 1.00

Test 2: Invalid coordinate range
[OK] Correctly caught invalid coordinates: latitude 999.0 above maximum 90

Test 3: Valid weather response
Temperature: 15.5°C
Feels like: 14.2°C
Conditions: Light Rain
Quality score: 1.00
Warnings: 0

Test 4: Temperature out of realistic range
[OK] Correctly caught unrealistic temperature: temperature 150.0 above maximum 60

All four cases land where they should. Valid responses normalise into typed dataclasses; invalid coordinates and out-of-range temperatures raise ValidationError before any downstream code touches them.

If your output diverges, the most likely failure points are the validators. If Tests 2 or 4 print ERROR: Should have raised ValidationError, the range checks in validators.py aren't firing; go back to §3 and re-run the import-check on validators.py to confirm validate_range and safe_float are loaded. If Tests 1 or 3 raise ValidationError on valid data, the required-fields list in the processor doesn't match the test payload: check both against the dataclass field names.

Section 6 builds the business logic layer on top: the orchestrator calls the API client and the processors in sequence and trusts whatever comes back. Every WeatherData on this side of the boundary has already been validated; the orchestrator's job is sequencing and outcome judgement, not error checking.