7. Practical exercises

Four exercises, each one reinforcing a specific technique from earlier in the chapter. They build on each other: Exercise 2 reuses the validator from Exercise 1, Exercise 3 applies the pattern to the News Aggregator, Exercise 4 pushes into cross-field rules that are clearer in Python than in a schema. Work through them in order. Solutions are hidden behind "Show Solution" so you can attempt each one before reading the code.

Exercise 1: build a user-profile validator

Build a manual three-layer validator for user profile data. This reinforces the structure -> content -> business-rules pattern from section 3 on a fresh domain.

user_profile.json

{
  "user": {
    "id": 12345,
    "email": "user@example.com",
    "age": 28,
    "username": "john_doe",
    "preferences": {
      "newsletter": true,
      "theme": "dark"
    }
  }
}

Requirements:

Structure: Root must have "user" object with id, email, age, username
Content: Email must contain "@", age between 13-120, username 3-20 characters
Business rules: If newsletter is true, email cannot be from disposable domains (tempmail, guerrillamail)

Task: Build validate_user_profile(data) following the three-layer pattern.

Show Solution

user_validator.py

def validate_user_profile_structure(data):
    """Layer 1: Structural validation."""
    if not isinstance(data, dict):
        return False, "Data must be a dictionary"
    
    if "user" not in data:
        return False, "Missing 'user' object"
    
    user = data["user"]
    if not isinstance(user, dict):
        return False, "'user' must be a dictionary"
    
    required = ["id", "email", "age", "username"]
    for field in required:
        if field not in user:
            return False, f"Missing required field: {field}"
    
    return True, None

def validate_user_profile_content(user):
    """Layer 2: Content validation."""
    # Email format
    email = user["email"]
    if not isinstance(email, str):
        return False, "Email must be string"
    if "@" not in email:
        return False, "Email must contain '@'"
    
    # Age range
    try:
        age = int(user["age"])
        if age < 13 or age > 120:
            return False, f"Age {age} outside valid range (13-120)"
    except (ValueError, TypeError):
        return False, f"Age must be integer, got: {user['age']}"
    
    # Username length
    username = user["username"]
    if not isinstance(username, str):
        return False, "Username must be string"
    if len(username) < 3 or len(username) > 20:
        return False, f"Username length {len(username)} outside valid range (3-20)"
    
    return True, None

def validate_user_profile_business_rules(user):
    """Layer 3: Business rules."""
    disposable_domains = ["tempmail.com", "guerrillamail.com"]
    
    preferences = user.get("preferences", {})
    if preferences.get("newsletter"):
        email = user["email"]
        domain = email.split("@")[-1]
        if domain in disposable_domains:
            return False, f"Newsletter requires non-disposable email (got {domain})"
    
    return True, None

def validate_user_profile(data):
    """Complete validation pipeline."""
    valid, error = validate_user_profile_structure(data)
    if not valid:
        return False, f"Structure validation failed: {error}"

    user = data["user"]

    valid, error = validate_user_profile_content(user)
    if not valid:
        return False, f"Content validation failed: {error}"

    valid, error = validate_user_profile_business_rules(user)
    if not valid:
        return False, f"Business rule validation failed: {error}"

    return True, None

if __name__ == "__main__":
    test_data = {
        "user": {
            "id": 12345,
            "email": "user@example.com",
            "age": 28,
            "username": "john_doe",
            "preferences": {"newsletter": True, "theme": "dark"}
        }
    }

    valid, error = validate_user_profile(test_data)
    print(f"Valid: {valid}, Error: {error}")

Exercise 2: convert to JSON Schema

Take the validator from Exercise 1 and convert the structural and content layers to JSON Schema. Keep the business-rule function as hand-written Python. This is the hybrid pattern from section 5 applied to a second domain -- the division of labour is the same, only the schema content changes.

Task: write a schema that covers the structure and content rules, then combine it with validate_user_profile_business_rules from Exercise 1.

The solution imports the business-rule validator from Exercise 1's user_validator.py; keep both files in the same directory.

Show Solution

user_hybrid.py

from jsonschema import validate, ValidationError
from user_validator import validate_user_profile_business_rules

user_schema = {
    "type": "object",
    "required": ["user"],
    "properties": {
        "user": {
            "type": "object",
            "required": ["id", "email", "age", "username"],
            "properties": {
                "id": {"type": "integer"},
                "email": {
                    "type": "string",
                    "pattern": "^.*@.*$"
                },
                "age": {
                    "type": "integer",
                    "minimum": 13,
                    "maximum": 120
                },
                "username": {
                    "type": "string",
                    "minLength": 3,
                    "maxLength": 20
                },
                "preferences": {
                    "type": "object",
                    "properties": {
                        "newsletter": {"type": "boolean"},
                        "theme": {
                            "type": "string",
                            "enum": ["light", "dark"]
                        }
                    }
                }
            }
        }
    }
}

def validate_user_hybrid(data):
    """Hybrid validation: schema + manual business rules."""
    # Schema handles structure and content
    try:
        validate(instance=data, schema=user_schema)
    except ValidationError as e:
        return False, f"Schema validation failed: {e.message}"
    
    # Manual business rules
    user = data["user"]
    valid, error = validate_user_profile_business_rules(user)
    if not valid:
        return False, f"Business rule validation failed: {error}"

    return True, None

if __name__ == "__main__":
    test_data = {
        "user": {
            "id": 12345,
            "email": "user@example.com",
            "age": 28,
            "username": "john_doe",
            "preferences": {"newsletter": True, "theme": "dark"}
        }
    }

    valid, error = validate_user_hybrid(test_data)
    print(f"Valid: {valid}, Error: {error}")

The "^.*@.*$" pattern matches any string containing an @, which is what the requirement asks for and no more. Real production code wants stricter checks (length caps, dot-atom local parts, valid domain shape). Use a library like email-validator for that, not a hand-rolled regex.

Exercise 3: fix a News Aggregator bug

The News Aggregator from Chapter 11 has a validation gap: empty article titles pass through the NewsAPI normalizer. The version below is the same bug stripped of the modular plumbing, so the validation logic is the only thing on screen.

buggy_normalizer.py

def normalize_newsapi(response):
    """Transform NewsAPI response - has a validation bug."""
    articles = []
    
    for item in response.get("articles", []):
        title = item.get("title", "").strip()
        url = item.get("url", "").strip()
        
        # BUG: This only checks URL!
        if not url:
            continue
        
        articles.append({
            "title": title,
            "url": url,
            "published_at": item.get("publishedAt", ""),
            "source": "NewsAPI"
        })
    
    return articles

Task: write a test that catches the bug, then fix the normalizer to validate both title and URL.

Show Solution

test_and_fix.py

# Test that catches the bug
def test_empty_title_rejected():
    """Articles with empty titles should be rejected."""
    response = {
        "articles": [
            {
                "title": "",  # Empty title
                "url": "https://example.com/article",
                "publishedAt": "2025-01-15T10:00:00Z"
            }
        ]
    }
    
    articles = normalize_newsapi(response)
    assert len(articles) == 0, "Empty title should be rejected"

# Fixed normalizer
def normalize_newsapi(response):
    """Transform NewsAPI response - fixed validation."""
    articles = []
    
    for item in response.get("articles", []):
        title = item.get("title", "").strip()
        url = item.get("url", "").strip()
        
        # FIX: Check BOTH title and url
        if not title or not url:
            continue
        
        articles.append({
            "title": title,
            "url": url,
            "published_at": item.get("publishedAt", ""),
            "source": "NewsAPI"
        })
    
    return articles

# Test now passes.
test_empty_title_rejected()
print("Test passed: empty titles are rejected")

In the actual Chapter 11 codebase, the equivalent fix lands in the production normalize_newsapi, which returns Article dataclass instances and imports safe_get / try_fields from api_helpers.py. The same one-line guard (if not title or not url) belongs in the same place inside the per-item loop. The solution above shows the post-fix file stripped of the modular plumbing; in practice, run the test against the buggy version first, watch it fail, then add the guard and run it again.

Exercise 4: validate multi-day forecast data

Weather forecast data carries parallel arrays (one entry per day for dates, max temperatures, and min temperatures). That shape has several cross-field rules no schema can express: arrays must have equal length, dates must be consecutive, max must be at least min for every day, and the day-over-day temperature swing must be plausible. This exercise focuses on layer 3 (cross-field rules), with the minimal structural checks needed to make the validator runnable on its own.

forecast_response.json

{
  "daily": {
    "time": ["2025-01-15", "2025-01-16", "2025-01-17"],
    "temperature_2m_max": [12.5, 14.2, 13.8],
    "temperature_2m_min": [6.1, 7.3, 8.0]
  }
}

Business rules:

All three arrays must have the same length, and that length must be positive
Dates must be sequential with no gaps (day N+1 is exactly one day after day N)
temperature_2m_max >= temperature_2m_min for every day
The day-over-day mean-temperature swing cannot exceed 30°C

Task: implement validate_forecast_data(data) that enforces all four rules.

Show Solution

forecast_validator.py

from datetime import datetime

def validate_forecast_data(data):
    """Validate forecast data with cross-field business rules."""
    
    # Structure
    if not isinstance(data, dict):
        return False, "Forecast data must be a dictionary"

    if "daily" not in data:
        return False, "Missing 'daily' section"
    
    daily = data["daily"]
    if not isinstance(daily, dict):
        return False, "'daily' section must be a dictionary"

    required = ["time", "temperature_2m_max", "temperature_2m_min"]
    for field in required:
        if field not in daily:
            return False, f"Missing required field: {field}"
    
    times = daily["time"]
    temp_max = daily["temperature_2m_max"]
    temp_min = daily["temperature_2m_min"]

    for field_name, values in [
        ("time", times),
        ("temperature_2m_max", temp_max),
        ("temperature_2m_min", temp_min),
    ]:
        if not isinstance(values, list):
            return False, f"{field_name} must be a list"
    
    # Rule 1: same length across all three arrays.
    if not (len(times) == len(temp_max) == len(temp_min)):
        return False, (
            f"Array length mismatch: times={len(times)}, "
            f"max={len(temp_max)}, min={len(temp_min)}"
        )
    
    # Bare structural check -- empty arrays would silently pass rules 2-4 below.
    if len(times) == 0:
        return False, "Empty forecast arrays"
    
    # Rule 2: dates must be consecutive, no gaps.
    prev_date = None
    for date_str in times:
        try:
            date = datetime.strptime(date_str, "%Y-%m-%d")
            if prev_date and (date - prev_date).days != 1:
                return False, f"Date gap: {prev_date.date()} to {date.date()}"
            prev_date = date
        except (ValueError, TypeError):
            return False, f"Invalid date format: {date_str}"
    
    # Rule 3: max temperature must be at least min for every day.
    for i, (tmax, tmin) in enumerate(zip(temp_max, temp_min)):
        if not isinstance(tmax, (int, float)) or not isinstance(tmin, (int, float)):
            return False, f"Day {i}: temperatures must be numeric"
        if tmax < tmin:
            return False, f"Day {i}: max {tmax}°C < min {tmin}°C"
    
    # Rule 4: day-over-day mean swing cannot exceed 30°C.
    for i in range(1, len(temp_max)):
        prev_avg = (temp_max[i-1] + temp_min[i-1]) / 2
        curr_avg = (temp_max[i] + temp_min[i]) / 2
        change = abs(curr_avg - prev_avg)
        
        if change > 30:
            return False, (
                f"Extreme temp change between day {i-1} and {i}: "
                f"{change:.1f}°C"
            )
    
    return True, None

# Demo
valid_forecast = {
    "daily": {
        "time": ["2025-01-15", "2025-01-16", "2025-01-17"],
        "temperature_2m_max": [12.5, 14.2, 13.8],
        "temperature_2m_min": [6.1, 7.3, 8.0]
    }
}

length_mismatch = {
    "daily": {
        "time": ["2025-01-15", "2025-01-16"],
        "temperature_2m_max": [12.5, 14.2, 13.8],
        "temperature_2m_min": [6.1, 7.3]
    }
}

date_gap = {
    "daily": {
        "time": ["2025-01-15", "2025-01-17"],
        "temperature_2m_max": [12.5, 13.8],
        "temperature_2m_min": [6.1, 8.0]
    }
}

extreme_swing = {
    "daily": {
        "time": ["2025-01-15", "2025-01-16"],
        "temperature_2m_max": [10.0, 45.0],
        "temperature_2m_min": [5.0, 40.0]
    }
}

print(validate_forecast_data(valid_forecast))
print(validate_forecast_data(length_mismatch))
print(validate_forecast_data(date_gap))
print(validate_forecast_data(extreme_swing))

Terminal

$ python forecast_validator.py
(True, None)
(False, 'Array length mismatch: times=2, max=3, min=2')
(False, 'Date gap: 2025-01-15 to 2025-01-17')
(False, 'Extreme temp change between day 0 and 1: 35.0°C')

Four validators across two domains: weather and user data. The patterns transfer because validation is generic. Every external data source needs structure, content, and business-rule checks, and every codebase eventually faces the manual-versus-schema decision. Section 8 closes the chapter with a review of the framework you can carry to any of them.