8. Testing OAuth Flows

OAuth is the kind of code you write once and then revisit months later. A provider changes a response detail, session handling is refactored, or error handling moves to a shared helper. Something subtle breaks in a flow that is difficult to reproduce manually. That is exactly why the callback deserves focused tests.

A good OAuth test suite pins down three things: your code sends the right payload to the token endpoint, your CSRF protection actually rejects mismatched state values, and your error handling doesn't leave partial state behind when the upstream call fails. All three are testable end-to-end with responses and the Flask test client, without running a real OAuth flow even once.

Add the callback handler

We'll test a standard authorization-code flow with CSRF protection. The /oauth/callback endpoint receives a code and state from GitHub, verifies the state matches what we stored at the start of the flow, exchanges the code for an access token, and stores the token in the session. Add this to your existing app.py:

app.py (OAuth additions)

@app.route("/oauth/callback")
def oauth_callback():
    code = request.args.get("code")
    state = request.args.get("state")

    expected_state = session.pop("oauth_state", None)
    if not state or state != expected_state:
        return jsonify({"error": "invalid state"}), 400

    if not code:
        return jsonify({"error": "missing code"}), 400

    try:
        response = requests.post(
            "https://github.com/login/oauth/access_token",
            data={
                "client_id": app.config["GITHUB_CLIENT_ID"],
                "client_secret": app.config["GITHUB_CLIENT_SECRET"],
                "code": code,
            },
            headers={"Accept": "application/json"},
            timeout=10,
        )
    except requests.RequestException:
        return jsonify({"error": "token exchange failed"}), 502

    if response.status_code != 200:
        return jsonify({"error": "token exchange failed"}), 502

    token = response.json().get("access_token")
    if not token:
        return jsonify({"error": "no token in response"}), 502

    session["github_token"] = token
    return redirect(url_for("dashboard"))

Several things can go wrong: missing or tampered state, a missing code, a network failure, a non-200 response, or a 200 response without a token. We will cover the successful exchange, the state check, an upstream error response, and a network timeout. The final section lists the remaining edge cases.

Before we can run these tests, the OAuth client config needs to be available during test runs. Update the client fixture in tests/conftest.py so the config block includes the GitHub credentials:

tests/conftest.py (config update)

flask_app.config.update(
    TESTING=True,
    SECRET_KEY="test-secret-not-for-production",
    WEATHER_API_KEY="test_key",
    GITHUB_CLIENT_ID="test_client_id",
    GITHUB_CLIENT_SECRET="test_client_secret",
)

Test 1: the happy path

The successful exchange. Mock GitHub's token endpoint, simulate a callback with matching state, verify the token ends up in the session and the user is redirected to the dashboard. Save this as tests/test_oauth.py:

tests/test_oauth.py

from urllib.parse import parse_qs

import requests
import responses

@responses.activate
def test_oauth_callback_exchanges_code_for_token(client):
    # Simulate the start of the OAuth flow: we stored a state token
    with client.session_transaction() as sess:
        sess["oauth_state"] = "matching_state_value"

    # GitHub will return a valid token when called
    responses.add(
        responses.POST,
        "https://github.com/login/oauth/access_token",
        json={"access_token": "gho_fake_token", "token_type": "bearer"},
        status=200,
    )

    # GitHub redirects the user back with code and matching state
    response = client.get(
        "/oauth/callback?code=abc123&state=matching_state_value"
    )

    # We were redirected to the dashboard
    assert response.status_code == 302
    assert "/dashboard" in response.headers["Location"]

    # The token is now in the session
    with client.session_transaction() as sess:
        assert sess["github_token"] == "gho_fake_token"
        assert "oauth_state" not in sess  # state was consumed

    # Our code sent the correct payload to GitHub
    assert len(responses.calls) == 1
    sent_request = responses.calls[0].request
    sent_body = parse_qs(sent_request.body)
    assert sent_body["code"] == ["abc123"]
    assert sent_body["client_id"] == ["test_client_id"]
    assert sent_body["client_secret"] == ["test_client_secret"]
    assert sent_request.url == "https://github.com/login/oauth/access_token"
    assert sent_request.headers["Accept"] == "application/json"

Run it from the project root:

Terminal

$ pytest tests/test_oauth.py
tests/test_oauth.py .                                                   [100%]

============================== 1 passed in 0.06s ==============================

This test does a lot in one go, so it's worth walking through what it locks down. The redirect works, the redirect target is right, the token is stored, and the state was consumed from the session. (A common bug is forgetting to pop the state value, leaving stale state that breaks the next login attempt.) But the strongest assertions are the last four, on responses.calls. Your code sent GitHub the right payload: the right code, the right client ID and secret, and the right Accept header.

Without that last block, the test would pass even if a refactor put parameters in the wrong place, dropped the Accept header, or changed the endpoint. Parsing the form body and asserting the exact URL makes those parts of the wire contract explicit.

Test 2: the CSRF check

OAuth's state parameter exists for one reason: to stop an attacker tricking a logged-in user's browser into completing an OAuth flow they didn't start. If your state check is broken, your application is vulnerable. This test proves it isn't. Add it to the bottom of tests/test_oauth.py:

tests/test_oauth.py (continued)

@responses.activate
def test_oauth_callback_rejects_mismatched_state(client):
    # We stored one state value
    with client.session_transaction() as sess:
        sess["oauth_state"] = "our_legitimate_state"

    # An attacker triggers the callback with a different state
    response = client.get(
        "/oauth/callback?code=abc123&state=attacker_controlled_state"
    )

    # Request is rejected
    assert response.status_code == 400
    assert response.get_json() == {"error": "invalid state"}

    # No token was fetched
    assert len(responses.calls) == 0

    # No token was stored
    with client.session_transaction() as sess:
        assert "github_token" not in sess

Three things this test locks down. The status code is 400, not 500 or a redirect to an error page you haven't built. No HTTP call was made to GitHub (the state check happens first, as it should). And the session remains clean, with no token stored.

Notice that @responses.activate is still applied even though no request should be made. That's deliberate. If a refactor accidentally moves the state check to after the token request, responses catches it (the unregistered endpoint raises ConnectionError, not a silent 200). Belt and braces.

Test 3: upstream failure

GitHub goes down. Or rate-limits you. Or responds with an error because your client ID was revoked. Your code needs to degrade gracefully and, critically, not leave your user's session in a half-authenticated state. Add this test to the bottom of tests/test_oauth.py:

tests/test_oauth.py (continued)

@responses.activate
def test_oauth_callback_handles_github_failure(client):
    with client.session_transaction() as sess:
        sess["oauth_state"] = "matching_state"

    responses.add(
        responses.POST,
        "https://github.com/login/oauth/access_token",
        status=500,
    )

    response = client.get("/oauth/callback?code=abc123&state=matching_state")

    # We returned a 502 Bad Gateway to the user
    assert response.status_code == 502
    assert response.get_json() == {"error": "token exchange failed"}

    # No partial state in the session
    with client.session_transaction() as sess:
        assert "github_token" not in sess

Test 4: network timeout

An error response is not the only way the token exchange can fail. GitHub might not respond before your timeout expires. This test makes responses raise the same exception that requests would raise on a real timeout:

tests/test_oauth.py (continued)

@responses.activate
def test_oauth_callback_handles_timeout(client):
    with client.session_transaction() as sess:
        sess["oauth_state"] = "matching_state"

    responses.add(
        responses.POST,
        "https://github.com/login/oauth/access_token",
        body=requests.Timeout("GitHub did not respond"),
    )

    response = client.get("/oauth/callback?code=abc123&state=matching_state")

    assert response.status_code == 502
    assert response.get_json() == {"error": "token exchange failed"}

    with client.session_transaction() as sess:
        assert "github_token" not in sess

Run the whole file and you should now have four passing tests:

Terminal

$ pytest tests/test_oauth.py
tests/test_oauth.py ....                                                [100%]

============================== 4 passed in 0.07s ==============================

The assertion on missing github_token is the critical one. If you ever see a test pass where the token is present after an upstream failure, stop and find out why. It means your code stored something useless, and the next authenticated request is going to fail in a much more confusing place.

What you haven't tested yet

These four tests cover the main flow and its riskiest failure paths. A production suite should still add GitHub returning 200 without an access_token, a callback with code missing, and any provider-specific error payloads your application handles. The scaffolding you've built makes those cases inexpensive to add.

That's the compounding payoff of good test infrastructure. The first test costs an hour. The tenth test costs five minutes.

Next, we'll step back from individual test files and look at how the whole suite composes: fixtures that layer, shared conftest patterns, running in CI, and keeping the whole thing fast.