3. Validating and converting types

.get() handled missing keys, and extract_user_safely already showed the shape of the next defence: isinstance() checks before treating a value as a dict. This page generalises that pattern. The other half of the JSON-extraction problem is keys that are present but the wrong type: a documented integer that arrives as a string, an object field that comes back as null, a string field that's empty when downstream code expects content. You'll work through it in three passes: a single-field demo to see the shape, two string-handling helpers, then a complete extraction function combining everything.

Four common type-mismatch crashes

Before writing the defensive version, take a quick look at the four scenarios that trip up the most production code. Save this at the project root as type_mismatch_demo.py:

type_mismatch_demo.py
# Scenario 1: String instead of integer
user_data = {"age": "34"}  # String, not int
try:
    next_year_age = user_data["age"] + 1  # TypeError!
    print(f"Next year: {next_year_age}")
except TypeError as e:
    print(f"ERROR: Math error: {e}")

# Scenario 2: Null where you expect string
user_data = {"name": None}
try:
    name_upper = user_data["name"].upper()  # AttributeError!
    print(name_upper)
except AttributeError as e:
    print(f"ERROR: Method error: {e}")

# Scenario 3: List where you expect dict
user_data = {"location": ["Dublin", "Ireland"]}  # List, not dict
try:
    city = user_data["location"]["city"]  # TypeError!
    print(city)
except TypeError as e:
    print(f"ERROR: Access error: {e}")

# Scenario 4: Empty string in calculations
user_data = {"price": ""}  # Empty string
try:
    total = float(user_data["price"]) * 1.1  # ValueError!
    print(f"Total: {total}")
except ValueError as e:
    print(f"ERROR: Conversion error: {e}")

Run it from the project root:

Terminal
$ python type_mismatch_demo.py
ERROR: Math error: can only concatenate str (not "int") to str
ERROR: Method error: 'NoneType' object has no attribute 'upper'
ERROR: Access error: list indices must be integers or slices, not str
ERROR: Conversion error: could not convert string to float: ''

None of these look like JSON problems at first glance. They look like Python errors. But each one traces back to the API returning a value that didn't match the documented type. This is why .get() alone isn't enough. You get the value, but the value still has to pass a type check before you can use it safely.

Validating with isinstance()

The defensive pattern is simple to say: before any type-specific operation (arithmetic, string methods, indexing), check that the value is the expected type with isinstance(). If it's not, either coerce it safely or fall back to a default. Here's that pattern applied to age, which is the field most likely to appear as int, string, float, or None across different endpoints.

Save this as safe_age.py:

safe_age.py
def safe_get_age(user_data):
    """
    Extract age with type validation and conversion.
    Returns integer age or None if invalid.
    """
    age_value = user_data.get("age")
    
    # Handle None explicitly
    if age_value is None:
        return None
    
    # Already correct type
    if isinstance(age_value, int):

        # Validate reasonable range
        if 0 <= age_value <= 150:
            return age_value
        else:
            return None  # Invalid age value
    
    # Try converting string to int
    if isinstance(age_value, str):

        # Remove whitespace
        age_value = age_value.strip()
        
        # Check if it's a number
        if age_value.isdigit():
            age = int(age_value)
            if 0 <= age <= 150:
                return age
    
    # Handle float (round down)
    if isinstance(age_value, float):
        age = int(age_value)
        if 0 <= age <= 150:
            return age
    
    # Couldn't convert
    return None

# Test with various inputs
test_cases = [
    {"age": 25},          # Valid int
    {"age": "30"},        # String number
    {"age": "  45  "},    # String with whitespace
    {"age": 32.7},        # Float
    {"age": None},        # Null
    {"age": "unknown"},   # Invalid string
    {"age": -5},          # Negative
    {"age": 200},         # Too large
    {},                   # Missing key
]

print("=== Type Validation Tests ===\n")
for i, test_data in enumerate(test_cases, 1):
    age = safe_get_age(test_data)
    input_val = test_data.get("age", "missing")
    print(f"Test {i}: input={input_val!r:15} -> age={age}")

Run it from the project root:

Terminal
$ python safe_age.py
=== Type Validation Tests ===

Test 1: input=25              -> age=25
Test 2: input='30'            -> age=30
Test 3: input='  45  '        -> age=45
Test 4: input=32.7            -> age=32
Test 5: input=None            -> age=None
Test 6: input='unknown'       -> age=None
Test 7: input=-5              -> age=None
Test 8: input=200             -> age=None
Test 9: input='missing'       -> age=None

Nine wildly different inputs; one consistent output shape. That's the goal. Working through safe_get_age has just exposed the pattern that runs through every defensive extractor in this chapter, even when the prose doesn't spell it out: three layers, in this order: existence, type, content. Handle None first (existence). Then check each expected type before doing anything type-specific (type). Then validate the value itself, a negative age or age over 150 is a bad value even though the type is correct (content). The next two sections apply the same three-layer move to string fields and to a complete extraction function.

Safe string operations

Strings come with their own sharp edges. .upper(), .split(), and .strip() all crash when called on None or a non-string value, so they need the same guard: check the type, strip whitespace, check for emptiness, then normalise. Email and name fields are good worked examples because they're both prone to nulls, whitespace, and wrong-shape inputs (a name that comes back as a string instead of a {"first", "last"} object).

Save this as safe_strings.py:

safe_strings.py
def safe_get_email(user_data):
    """Extract and normalize email address safely."""
    email = user_data.get("email")
    
    # Validate it's a string
    if not isinstance(email, str):
        return "No email provided"
    
    # Clean whitespace
    email = email.strip()
    
    # Check not empty
    if not email:
        return "No email provided"
    
    # Normalize to lowercase
    email = email.lower()
    
    # Basic validation (real apps use regex)
    if "@" not in email or "." not in email:
        return "Invalid email format"
    
    return email

def safe_get_name(user_data):
    """Extract and format name safely."""
    name_obj = user_data.get("name", {})
    
    # Validate it's a dictionary
    if not isinstance(name_obj, dict):
        return "Unknown"
    
    first = name_obj.get("first", "")
    last = name_obj.get("last", "")
    
    # Validate both are strings
    if not isinstance(first, str):
        first = ""
    if not isinstance(last, str):
        last = ""
    
    # Clean whitespace
    first = first.strip()
    last = last.strip()
    
    # Build full name
    if first and last:
        return f"{first} {last}"
    elif first:
        return first
    elif last:
        return last
    else:
        return "Unknown"

# Test cases
test_users = [
    {"email": "alice@example.com", "name": {"first": "Alice", "last": "Smith"}},
    {"email": "  bob@example.com  ", "name": {"first": "  Bob  ", "last": ""}},
    {"email": None, "name": {"first": "Charlie"}},
    {"email": "invalid-email", "name": {"first": None, "last": "Davis"}},
    {"email": "", "name": "NotADict"},  # Wrong type
    {},  # Missing everything
]

print("=== Safe String Operations ===\n")
for i, user in enumerate(test_users, 1):
    name = safe_get_name(user)
    email = safe_get_email(user)
    print(f"User {i}:")
    print(f"  Name:  {name}")
    print(f"  Email: {email}")
    print()

Run it from the project root:

Terminal
$ python safe_strings.py
=== Safe String Operations ===

User 1:
  Name:  Alice Smith
  Email: alice@example.com

User 2:
  Name:  Bob
  Email: bob@example.com

User 3:
  Name:  Charlie
  Email: No email provided

User 4:
  Name:  Davis
  Email: Invalid email format

User 5:
  Name:  Unknown
  Email: No email provided

User 6:
  Name:  Unknown
  Email: No email provided

Same three-layer pattern, applied to strings this time: check the type, clean the input (strip whitespace), check for emptiness, and only then apply any normalisation or format check. The order matters, you can't check "is this a valid email?" until you've first confirmed you have a string, and even then, stripping whitespace before checking emptiness prevents the annoying bug where a value looks non-empty but is just " ".

A complete type-safe extraction function

Now combine the three layers, existence, type, content, into a single extraction function that handles every edge case we've discussed. This supersedes extract_user_safely from the previous page: same role, expanded with the type checks we've just covered. The rest of the chapter (the batch processor on the next page, the debugging exercises after that) uses this version, and it's the function you'd actually ship: .get() everywhere, isinstance() before every type-specific operation, range checks on anything numeric, and a guaranteed return shape.

Save this as type_safe_user.py:

type_safe_user.py
def extract_user_with_type_validation(user_data):
    """
    Extract user with complete type validation and conversion.
    
    Returns dict with guaranteed structure or None if fundamentally invalid.
    """

    # Validate input is a dictionary
    if not isinstance(user_data, dict):
        return None
    
    # Extract and validate name (nested dict)
    name_obj = user_data.get("name")
    if isinstance(name_obj, dict):
        first = name_obj.get("first", "")
        last = name_obj.get("last", "")
        
        # Ensure strings
        if not isinstance(first, str):
            first = ""
        if not isinstance(last, str):
            last = ""
        
        # Clean and build
        first = first.strip()
        last = last.strip()
        
        if first and last:
            full_name = f"{first} {last}"
        elif first:
            full_name = first
        elif last:
            full_name = last
        else:
            full_name = "Unknown"

        # Normalise empties to "Unknown" so the returned shape stays
        # consistent with extract_user_safely from the previous page.
        if not first:
            first = "Unknown"
        if not last:
            last = "Unknown"
    else:
        first = "Unknown"
        last = "Unknown"
        full_name = "Unknown"
    
    # Extract and validate email (string)
    email = user_data.get("email")
    if isinstance(email, str):
        email = email.strip().lower()
        if not email or "@" not in email:
            email = "No email provided"
    else:
        email = "No email provided"
    
    # Extract and validate age (int/string/float -> int)
    age_raw = user_data.get("dob", {}) if isinstance(user_data.get("dob"), dict) else {}
    age_value = age_raw.get("age")
    
    if isinstance(age_value, int) and 0 <= age_value <= 150:
        age = age_value
    elif isinstance(age_value, str) and age_value.strip().isdigit():
        age_int = int(age_value.strip())
        age = age_int if 0 <= age_int <= 150 else None
    elif isinstance(age_value, float) and 0 <= age_value <= 150:
        age = int(age_value)
    else:
        age = None
    
    # Extract and validate location (nested dict)
    location_obj = user_data.get("location")
    if isinstance(location_obj, dict):
        city = location_obj.get("city", "Unknown")
        country = location_obj.get("country", "Unknown")
        
        # Ensure strings
        if not isinstance(city, str) or not city.strip():
            city = "Unknown"
        else:
            city = city.strip()
        
        if not isinstance(country, str) or not country.strip():
            country = "Unknown"
        else:
            country = country.strip()
    else:
        city = "Unknown"
        country = "Unknown"
    
    # Return guaranteed structure
    return {
        "full_name": full_name,
        "first_name": first,
        "last_name": last,
        "email": email,
        "age": age,  # Can be None
        "city": city,
        "country": country,
        "location_full": f"{city}, {country}"
    }

if __name__ == "__main__":
    # Test with messy, real-world-like data
    test_cases = [

        # Perfect data
        {
            "name": {"first": "Alice", "last": "Smith"},
            "email": "alice@example.com",
            "dob": {"age": 30},
            "location": {"city": "Dublin", "country": "Ireland"}
        },

        # Type mismatches
        {
            "name": {"first": "Bob", "last": None},  # Null last name
            "email": "  bob@example.com  ",  # Needs cleaning
            "dob": {"age": "25"},  # String age
            "location": {"city": "", "country": "USA"}  # Empty city
        },

        # Missing nested objects
        {
            "name": None,  # Null instead of object
            "email": None,
            "dob": None,
            "location": ["City", "Country"]  # Wrong type (list)
        },

        # Empty/missing everything
        {},
    ]

    print("=== Type-Safe Extraction Tests ===\n")
    for i, test_data in enumerate(test_cases, 1):
        result = extract_user_with_type_validation(test_data)
        if result:
            print(f"Test {i}:")
            print(f"  Name:     {result['full_name']}")
            print(f"  Email:    {result['email']}")
            print(f"  Age:      {result['age'] if result['age'] is not None else 'Not provided'}")
            print(f"  Location: {result['location_full']}")
            print()
        else:
            print(f"Test {i}: Invalid data structure\n")

Run it from the project root:

Terminal
$ python type_safe_user.py
=== Type-Safe Extraction Tests ===

Test 1:
  Name:     Alice Smith
  Email:    alice@example.com
  Age:      30
  Location: Dublin, Ireland

Test 2:
  Name:     Bob
  Email:    bob@example.com
  Age:      25
  Location: Unknown, USA

Test 3:
  Name:     Unknown
  Email:    No email provided
  Age:      Not provided
  Location: Unknown, Unknown

Test 4:
  Name:     Unknown
  Email:    No email provided
  Age:      Not provided
  Location: Unknown, Unknown

Four wildly different inputs, including one with every field as the wrong type, and all four produce a dict you can safely read from. Notice how age is None in the "Not provided" cases rather than the string "Unknown" -- that distinction lets the display code handle "genuinely unknown" differently from "user is 0 years old", which matters any time you do reporting, averaging, or conditional display.

Drawn as a pipeline, the function you just built is three sequential gates. Every field passes through each one, and every failure routes to a safe default. The combination is what guarantees a clean output regardless of what came in:

Pipeline diagram showing four JSON fields (name, email, age, city) flowing through three sequential checks. Existence catches a missing city (falls back to Unknown). Type catches a null email (falls back to No email). Content catches an out-of-range age over 150 (falls back to null). Name passes all three gates unchanged. The four fields combine into a clean output object.
Four fields, three gates, one clean output. Every field passes each check in order; every failure routes to a safe default. No field reaches the output without either passing all three gates or being replaced.

One thing worth calling out for readers already familiar with Python: it might feel strange to see this many isinstance() checks when Python culture generally prefers "ask forgiveness"--wrap the block in a try/except and catch errors. For local dictionary work that's fine. For deeply nested data from an untrusted source, explicit checks buy granular control: a missing key, a wrong type, and a null value all need different responses, and a single except TypeError can't tell them apart. The verbosity is the feature.