Validating JSON in Python with jsonschema (and When You Don't Need It)
You do not control the JSON an API hands back. You read the docs, you write code against the shape they describe, and most of the time it matches. Then a field gets renamed, an optional key goes missing, or a number arrives as a string, and your program keeps running anyway, carrying bad data forward until it trips over itself somewhere far away.
When that happens, the error you see is almost never the error that occurred. A missing id field becomes a KeyError three functions later. A number that arrived as a string becomes a TypeError in the middle of a calculation. The traceback points at the line that finally choked, not at the response that was wrong, and you spend twenty minutes debugging code that was never the problem.
Validating the shape up front turns that mystery into a clear, early error. This guide shows how to do it with the jsonschema library: what a schema is, how to write one, how to catch failures with a useful message, and the handful of schema pieces you will actually use day to day. It also makes the honest case for when a schema is overkill and a couple of plain checks are better. It assumes Python 3.10 or later.
What a JSON Schema is
A JSON Schema is a declarative description of what a valid piece of JSON looks like. Instead of writing imperative checks ("if this key is missing, raise; if that value is not a number, raise"), you describe the shape you expect, and a validator compares real data against it. You say what good looks like, and the library tells you where reality diverges.
The description covers the things that go wrong in practice: which keys must be present, what type each value should be, how nested objects and arrays are structured, and which values are allowed where the set is fixed. A schema for a user might say "this is an object, it must have an id that is an integer and an email that is a string, and it may have a role that is one of admin, member, or guest."
JSON Schema is a cross-language standard, not a Python invention. The same schema document can validate JSON in JavaScript, Go, or Ruby, which is part of why it is a good way to write down a contract that more than one service has to agree on. The jsonschema library is simply Python's implementation of that standard.
Your first schema
The library is not in the standard library, so install it first.
pip install jsonschema
Now imagine an API that returns a single user. Here is a sample response, as the dictionary you would get back from response.json().
data = {
"id": 42,
"email": "ada@example.com",
"active": True,
}
A schema describing that shape is itself a Python dictionary. We say it is an object, list the properties we care about and their types, and mark the ones that must be present.
schema = {
"type": "object",
"properties": {
"id": {"type": "integer"},
"email": {"type": "string"},
"active": {"type": "boolean"},
},
"required": ["id", "email"],
}
With both in hand, validating is one call.
from jsonschema import validate
validate(instance=data, schema=schema)
If the data matches, validate returns None and your program carries on. There is no "success" value to check, because success is simply the absence of an exception. If the data does not match, it raises, which is what the next section is about.
Catching validation failures
Because validate raises on bad data, you handle it the same way you handle any other expected failure: wrap it and catch the specific error. The exception type is ValidationError.
from jsonschema import validate, ValidationError
bad_data = {"id": "42", "active": True}
try:
validate(instance=bad_data, schema=schema)
except ValidationError as err:
print("Invalid response:", err.message)
Two things are wrong with bad_data: id is a string rather than an integer, and the required email is missing entirely. When you run this, err.message spells out the first failure in plain language, something like '42' is not of type 'integer'. That message is the whole point. Instead of a KeyError surfacing deep in code that assumed the data was fine, you get a precise complaint at the exact moment the data entered your program.
Validate at the boundary
Check data the moment it crosses into your program, right after the API call or webhook arrives, before any business logic touches it. Once a response has passed validation, the rest of your code can trust its shape and stop defending against it. This is the same idea as the semantic layer in our error-handling guide: a 200 can still carry the wrong shape, and validating at the boundary is how you catch that before it spreads.
The handler catches ValidationError specifically, not a bare except, so a bug in your own code nearby still surfaces as itself instead of being swallowed. What you do in the except block depends on the call: log the message and skip the record, fall back to a default, or surface a clear error and stop. The win is that the failure is now a planned branch with a readable reason, not a crash three steps downstream.
The schema pieces you will actually use
JSON Schema is a large standard, but a small core covers almost everything you meet validating API responses. Here are the pieces worth knowing, each kept to the smallest example that shows it.
type checks the kind of value
The most basic constraint. The values are "string", "number", "integer", "boolean", "array", "object", and "null". Note that "number" allows decimals while "integer" does not, a distinction that catches a price arriving as 9 where you expected 9.0 only if you are strict about it.
{"type": "string"}
required lists the keys that must be present
Properties are optional by default. Listing a key in required says the object is invalid without it. This is the check that turns a silently missing field into an immediate, named error.
{
"type": "object",
"properties": {"id": {"type": "integer"}},
"required": ["id"],
}
properties describes a nested object
Because a property's value can itself be an object with its own properties, schemas nest to match nested JSON. Here a user has an address object with its own required field.
{
"type": "object",
"properties": {
"address": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"],
}
},
}
array with items validates every element
For a list, set "type": "array" and describe a single element under items. The validator applies that element schema to every entry, so one rule covers a list of any length.
{
"type": "array",
"items": {
"type": "object",
"properties": {"id": {"type": "integer"}},
"required": ["id"],
},
}
enum restricts a value to a fixed set
When a field can only be one of a known handful of values, enum rejects anything else. This catches a typo or an unexpected new status code before it flows into a branch that does not handle it.
{"enum": ["admin", "member", "guest"]}
Combine these and you can describe most real responses: an object with required fields, some of which are nested objects or arrays of objects, with a few enums where the set is fixed. You rarely need more than this for boundary validation.
When you don't need jsonschema
A schema earns its keep when the data crosses a trust boundary: it comes from an external API, a webhook, a config file, or anywhere you cannot see or control how it was produced. That is data worth validating, because you genuinely do not know what will arrive.
For small, known, internal data, a schema is ceremony. If a function builds a dictionary three lines up and passes it to another function you also wrote, you already know its shape. Wrapping it in a schema adds a dependency and a layer of indirection to defend against a failure that cannot happen. A plain check, or a default with data.get("key", default), is clearer and reads better.
# Internal data you just built. A schema here is overkill.
timeout = config.get("timeout", 10)
if not isinstance(timeout, int):
raise ValueError("timeout must be an integer")
Over-validating internal data is its own bug
Reaching for a schema on every dictionary adds a dependency and a wall of declaration for no real safety, and it trains readers to skim past validation as boilerplate. The signature is a schema guarding data your own code produced moments earlier, where no untrusted input ever enters. Reserve schema validation for data that crosses a trust boundary; for the rest, an explicit check or a sensible default says more in less space.
The rule of thumb is simple. If you cannot point to where the data came from and be sure of its shape, validate it. If you can, a couple of plain checks will usually say everything a schema would, with less machinery.
jsonschema vs pydantic
These two get compared constantly, and the comparison usually misses that they solve adjacent problems rather than the same one. Neither is better; they hand you different things.
jsonschema validates a raw dictionary against a declarative, language-neutral contract. You still have a plain dict afterwards, accessed with data["id"], and the schema is data you can store, share, or hand to a service written in another language. pydantic takes a different path: you declare a model class, and it parses the JSON into a typed Python object you access with user.id, coercing and validating as it goes. After pydantic you are working with an object that has attributes and type hints, not a dictionary.
| jsonschema | pydantic | |
|---|---|---|
| What you get back | The same dict, now trusted | A typed model object |
| Validation style | Declarative schema (data) | Python class with type hints |
| The contract | Language-neutral, shareable | Python-only |
| Best for | Validating arbitrary JSON against a shared contract | A typed model you use throughout your app |
Reach for jsonschema when the schema itself is the deliverable, or when you just want to confirm a dict is shaped right and keep working with the dict. Reach for pydantic (its current v2 line parses JSON into BaseModel subclasses) when you want a typed object with attribute access and editor autocomplete carried through your codebase. Many projects use both: pydantic for their own models, jsonschema for validating third-party payloads against a published contract.
Frequently asked questions
Should I use jsonschema or pydantic?
Use jsonschema when you want to validate a raw dictionary against a declarative, language-neutral contract and keep working with the dict, or when the schema needs to be shared with services in other languages. Use pydantic when you want the JSON parsed into a typed Python object with attribute access and type hints carried through your code. They solve adjacent problems, so it is common to use both in one project.
Do I need to validate every API response?
No. Validate data that crosses a trust boundary, such as an external API, a webhook, or a config file, or anywhere a wrong shape would cause a confusing failure deep in your code. Trivial or internal data you produced yourself rarely needs a schema, and wrapping it in one adds a dependency and ceremony for no real safety. A plain check or a sensible default is clearer there.
Where should JSON validation live in my code?
At the boundary, right after the data enters your program: immediately after the API call returns or the webhook is received, before any business logic touches it. Validating there means the rest of your code can trust the shape and stop re-checking it. Scattering validation deep in business logic both duplicates the work and lets bad data travel further before anyone notices.
Mastering APIs with Python
Validating data at the boundary is one habit of code that survives contact with real APIs. In the full book, you build real clients against live services and make them robust end to end: validation, error handling, retries, and tests, applied across six portfolio projects covering Flask, OAuth, SQLite, Postgres, Docker, CI/CD, and AWS.
Get the book for €35