4. Building robust receivers

The 10-minute prototype prints incoming events and returns 200; the logging receiver from Section 3 tells you what is on the wire. Neither survives contact with production. Real webhook providers retry deliveries that take more than ~10 seconds, send the same event twice if the network blips, and quietly disable webhooks whose endpoint returns 5xx too often. This section teaches the production shape: store every event in SQLite and acknowledge in milliseconds, dedup retries by delivery ID so the same event never processes twice, and move the slow downstream work (Slack posts, database writes, PDF generation) to a background worker that drains the queue at its own pace.

Fast acknowledge, slow work

A webhook provider expects one thing above all else from your endpoint: a quick and reliable response. When GitHub sends a webhook, it does not want to wait while you talk to Slack, update a database, and generate a PDF. If your handler is slow or fails with a 500 error, GitHub retries the delivery and may eventually disable the webhook entirely.

A robust webhook receiver follows a simple pattern:

  • Receive: Accept the HTTP request, read the headers and body.
  • Validate: Check that the request is genuine and well formed.
  • Store: Persist the event somewhere durable (for example, a database).
  • Acknowledge: Return a fast 200 OK to the provider.
  • Process: Do the slower work (notify Slack, update state) in the background.

This pattern reduces the chance of timeouts, makes retries safe, and gives you a clear audit trail of what happened. In the rest of this section you will implement the "store and acknowledge" part using SQLite and then add a simple background processor.

Storing webhook events in SQLite

You will reuse the database skills you learned earlier in the book. Create a small webhook_events table that records each delivery from GitHub. This gives you a permanent log that you can inspect when something goes wrong.

Database schema for webhook events

schema.sql
CREATE TABLE IF NOT EXISTS webhook_events (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    source TEXT NOT NULL,
    event_type TEXT NOT NULL,
    delivery_id TEXT NOT NULL UNIQUE,
    payload_json TEXT NOT NULL,
    received_at TEXT NOT NULL,
    processed_at TEXT,
    status TEXT NOT NULL
);

Each row represents one delivery from a provider. The delivery_id column is marked UNIQUE, which you will use later to ignore duplicate deliveries safely.

Next, update your Flask application so that it stores incoming events in this table before acknowledging GitHub.

receiver_with_db.py
import json
import sqlite3
from datetime import datetime, timezone
from flask import Flask, request

DB_FILE = "webhooks.db"

app = Flask(__name__)

def get_db_connection():
    conn = sqlite3.connect(DB_FILE)
    conn.row_factory = sqlite3.Row
    return conn

def init_db():
    with get_db_connection() as conn:
        conn.execute(
            """
            CREATE TABLE IF NOT EXISTS webhook_events (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                source TEXT NOT NULL,
                event_type TEXT NOT NULL,
                delivery_id TEXT NOT NULL UNIQUE,
                payload_json TEXT NOT NULL,
                received_at TEXT NOT NULL,
                processed_at TEXT,
                status TEXT NOT NULL
            );
            """
        )
        conn.commit()

def store_webhook_event(source, event_type, delivery_id, payload_dict):
    payload_json = json.dumps(payload_dict)
    received_at = datetime.now(timezone.utc).isoformat(timespec="seconds")

    with get_db_connection() as conn:
        cursor = conn.cursor()
        try:
            cursor.execute(
                """
                INSERT INTO webhook_events
                    (source, event_type, delivery_id, payload_json, received_at, processed_at, status)
                VALUES
                    (?, ?, ?, ?, ?, NULL, ?)
                """,
                (source, event_type, delivery_id, payload_json, received_at, "unprocessed"),
            )
            conn.commit()
            return cursor.lastrowid
        except sqlite3.IntegrityError:
            # A row with this delivery_id already exists
            return None

@app.post("/webhooks/github")
def github_webhook():
    event_name = request.headers.get("X-GitHub-Event", "unknown")
    delivery_id = request.headers.get("X-GitHub-Delivery", "no-delivery-id")

    payload = request.get_json(silent=True) or {}

    event_id = store_webhook_event(
        source="github",
        event_type=event_name,
        delivery_id=delivery_id,
        payload_dict=payload,
    )

    if event_id is None:
        print(f"Ignoring duplicate delivery: {delivery_id}")
    else:
        print(f"Stored event {event_id} from delivery {delivery_id} ({event_name})")

    # Acknowledge receipt quickly
    return "", 200

if __name__ == "__main__":
    init_db()
    app.run(debug=True, port=5000)

Now every webhook delivery is recorded in the database. If a request fails later in your processing pipeline or a Slack notification does not send, you still have a copy of the original payload and a clear record of what was received.

Handling duplicate deliveries (idempotency)

Webhook providers retry deliveries when they see errors or timeouts. This is good for reliability, but it means you must assume that the same event may arrive more than once. If your handler sends a Slack message or charges a customer each time it sees an event, duplicates become a serious problem.

The solution is idempotency. An idempotent handler can safely process the same delivery multiple times without changing the end result after the first successful run. In practice, this usually means:

  • Each delivery has a unique identifier (for example, X-GitHub-Delivery).
  • You store that identifier in your database.
  • If you see the same identifier again, you treat it as a duplicate and skip the work.

The webhook_events table enforces this behaviour using a UNIQUE constraint on delivery_id. When store_webhook_event tries to insert a duplicate delivery, SQLite raises IntegrityError and the function returns None. Your webhook handler logs the duplicate and still returns 200 OK to GitHub.

Later, when you add background processing, you will use the status and processed_at columns to mark which events have been handled. That gives you a complete picture: what was received, what has been processed, and what is still pending.

A simple background processor

In a production system you would use a dedicated task queue such as Celery, RQ, or a message broker to process webhook events in the background. To keep this chapter focused, you will build a simpler pattern that still demonstrates the idea: a separate script that polls the database for unprocessed events and handles them one by one. Yes, that is polling again, but polling a local database you own is cheap and rate-limit-free, unlike polling someone else's API; the difference is who pays for the empty checks.

Background worker that processes events

worker.py
import json
import sqlite3
import time
from datetime import datetime, timezone

DB_FILE = "webhooks.db"

def get_db_connection():
    conn = sqlite3.connect(DB_FILE)
    conn.row_factory = sqlite3.Row
    return conn

def fetch_unprocessed_events(limit=10):
    with get_db_connection() as conn:
        rows = conn.execute(
            """
            SELECT id, source, event_type, delivery_id, payload_json
            FROM webhook_events
            WHERE status = 'unprocessed'
            ORDER BY received_at ASC
            LIMIT ?
            """,
            (limit,),
        ).fetchall()
    return rows

def mark_event_processed(event_id, status="processed"):
    processed_at = datetime.now(timezone.utc).isoformat(timespec="seconds")
    with get_db_connection() as conn:
        conn.execute(
            """
            UPDATE webhook_events
            SET status = ?, processed_at = ?
            WHERE id = ?
            """,
            (status, processed_at, event_id),
        )
        conn.commit()

def handle_event(row):
    payload = json.loads(row["payload_json"])
    event_type = row["event_type"]
    delivery_id = row["delivery_id"]

    # For now, just print a summary. Later you will send Slack notifications here.
    print(f"[{row['id']}] Handling event {event_type} (delivery {delivery_id})")
    print("Payload keys:", list(payload.keys()))

def run_worker(loop_delay=5):
    print("Starting webhook worker. Press Ctrl+C to stop.")
    try:
        while True:
            events = fetch_unprocessed_events()
            if not events:
                print("No unprocessed events. Sleeping...")
                time.sleep(loop_delay)
                continue

            for row in events:
                try:
                    handle_event(row)
                    mark_event_processed(row["id"], status="processed")
                except Exception as exc:
                    print(f"Error processing event {row['id']}: {exc}")
                    mark_event_processed(row["id"], status="error")

    except KeyboardInterrupt:
        print("Worker stopped.")

if __name__ == "__main__":
    run_worker()

Run this worker in a separate terminal while your Flask app is receiving webhooks:

Terminal
python worker.py
Starting webhook worker. Press Ctrl+C to stop.
No unprocessed events. Sleeping...
[1] Handling event issues (delivery a1b2c3d4-1234-5678-9abc-def012345678)
Payload keys: ['action', 'issue', 'repository', 'sender']
No unprocessed events. Sleeping...

The receiver writes events into the database and returns quickly. The worker picks up unprocessed events, handles them, and marks them as processed. In the next sections you replace the placeholder handle_event logic with real behaviour: sending Slack notifications for GitHub activity.

You now have the core production shape: a fast HTTP path that persists and acknowledges, a separate worker that drains and processes, and a delivery-ID column that makes retries safe. Section 5 closes the obvious remaining gap: the receiver currently trusts every incoming POST, so a malicious actor who guesses your webhook URL can forge events. HMAC signature verification fixes that.