10. Review and chapter quiz

You've mastered database migration from SQLite to PostgreSQL: when to migrate, how to plan it, how to execute it without losing data, and how to keep the schema under version control. This page consolidates the six skills the chapter built, then tests them with five interview-style questions.

You've mastered database migration from SQLite to PostgreSQL. You can recognise when SQLite's architectural limits require migration, plan migrations using the four-phase workflow, write migration scripts with zero data loss, and use Alembic for schema version control.

These aren't just technical skills. They're professional capabilities that separate developers who prototype from engineers who migrate production systems safely.

Key skills mastered

Six capabilities, each one independently useful and each one a signal of mid-level engineering judgement in interviews:

Recognise when SQLite's limits require PostgreSQL. You can identify the five signals indicating migration time and assess which matter for your specific application. You understand that write-concurrency errors affect web applications most critically, while team-collaboration friction impacts development workflows. You know when SQLite's simplicity provides value and when PostgreSQL's client-server architecture becomes essential. This architectural judgement separates developers who follow tutorials from engineers who make informed infrastructure decisions.
Plan four-phase migrations that minimise downtime and prevent data loss. You can systematically plan migrations following the Preparation → Setup → Execution → Validation workflow. You create backups before starting, analyse schemas thoroughly, set up PostgreSQL infrastructure before touching data, migrate incrementally with verification, and test comprehensively before deployment. This systematic approach prevents the data loss and production incidents that plague poorly planned migrations.
Map SQLite data types to appropriate PostgreSQL equivalents. You can analyse SQLite schemas and choose optimal PostgreSQL types: SERIAL PRIMARY KEY for auto-incrementing IDs, VARCHAR with appropriate length constraints, NUMERIC for precision decimals, TIMESTAMP WITH TIME ZONE for proper timestamp handling. You understand how VARCHAR constraints improve query planning, how NUMERIC prevents floating-point errors, and how timezone-aware timestamps prevent DST bugs.
Write migration scripts with progress tracking and error handling. You can write Python scripts that migrate data safely: reading from SQLite, converting types appropriately, inserting into PostgreSQL with proper error handling, tracking progress for large datasets, and verifying row counts match. You know how to handle special cases like NULL values, empty strings, and timestamps in different formats. You can make migrations idempotent using ON CONFLICT clauses, allowing safe re-runs after failures.
Convert applications from sqlite3 to psycopg2 with connection pooling. You can update Python applications to use PostgreSQL: changing connection strings from file paths to network credentials, converting placeholder syntax from ? to %s, adding explicit transaction commits, and implementing connection pooling for production performance. You can configure pool parameters (minconn, maxconn) appropriately for your application's load.
Use Alembic for version-controlled schema evolution. You can set up Alembic in projects, create initial migrations from existing schemas, write incremental schema changes, and apply migrations safely in development and production. You understand migration history, rollback capabilities, and the trade-offs between manual and autogenerated migrations. You know how Alembic enables team coordination by ensuring all developers and servers run the same schema version.

Portfolio and interview value

Database migration skills prove you've worked with production systems facing real architectural constraints. When discussing this chapter in interviews, emphasise systematic planning (the four-phase workflow), data-integrity preservation (zero data loss), and production considerations (connection pooling, Alembic). Talk about a specific migration challenge you solved and the technical decisions you made. That kind of concrete problem-solving under production constraints is exactly what hiring managers look for in mid-level engineers.

Chapter quiz

Five questions that mirror how this material comes up in technical interviews. Take time to formulate complete answers before opening the explanation under each one.

What are the five signals indicating it's time to migrate from SQLite to PostgreSQL, and which one matters most for web applications?

The five signals are: (1) write concurrency errors, where "database is locked" errors appear when multiple processes try to write simultaneously; (2) network access requirements, where your database needs to be on a separate server from your application; (3) team collaboration friction, where multiple developers need direct database access for debugging and migrations; (4) advanced features needed, like JSONB, full-text search, or PostGIS that SQLite doesn't provide; and (5) growing database size, where performance degrades as your database approaches 100 GB.

Write concurrency is the most critical signal for web applications. When you deploy a Flask or Django app to production with Gunicorn or uWSGI, you typically run three to five worker processes to handle concurrent requests. Each worker is a separate Python process. When two users simultaneously create posts, update profiles, or save data, two workers try to write to the SQLite database at once. SQLite allows only one writer at a time, so one worker succeeds while the other gets OperationalError: database is locked and the user sees an error page.

This problem doesn't affect development (single process) or read-heavy applications (no write conflicts). But for production web apps with multiple workers handling concurrent users making changes, it's a showstopper. PostgreSQL solves this by running as a separate server process that coordinates writes internally. Your workers all connect to PostgreSQL simultaneously, each thinking it has exclusive access, while PostgreSQL manages concurrency behind the scenes using sophisticated locking mechanisms. Users never see locking errors because the database server handles them.

Real-world example: your Music Time Machine works perfectly on localhost (one Flask process). Deploy to Railway with three Gunicorn workers and suddenly random users get errors when creating playlists. Those are write concurrency failures. Migrate to PostgreSQL and the errors disappear because PostgreSQL coordinates concurrent writes without exposing locking to your application.

Explain the four-phase migration workflow and why each phase is necessary.

The four-phase workflow is: Phase 1 (Preparation) backup and analyse, Phase 2 (Setup) install PostgreSQL and create the schema, Phase 3 (Execution) migrate data and update code, Phase 4 (Validation) test and deploy. Each phase builds on the previous one, creating a safe path from SQLite to PostgreSQL with minimal risk.

Phase 1 (Preparation) is necessary because you need to understand what you're migrating before touching any code or data. You back up your SQLite database (if anything goes wrong, you can restore), analyse your schema programmatically (documenting tables, columns, types, constraints), identify all queries your application runs (ensuring nothing breaks), and plan your PostgreSQL schema with appropriate type mappings. Skipping preparation leads to forgotten tables, incorrect type choices, and missing indexes that cause performance problems in production.

Phase 2 (Setup) creates the PostgreSQL infrastructure before migrating data. You install PostgreSQL locally for testing, create the production database on Railway, Render, or RDS, write CREATE TABLE statements with proper PostgreSQL types, and set up Alembic for schema version control. This phase keeps PostgreSQL empty. You're building the structure to receive data. Skipping setup means migrating data with no destination or, worse, migrating to poorly designed schemas that require re-migration.

Phase 3 (Execution) is where data actually moves and code changes. You write migration scripts that read SQLite, convert types (especially timestamps), write to PostgreSQL with progress tracking, and verify row counts match. Then you update application code: change connection strings, convert placeholder syntax (? to %s), add explicit commits, and implement connection pooling. This phase requires careful testing. Migrating data incorrectly loses information; incorrect code breaks production.

Phase 4 (Validation) ensures everything works before deployment. You run your test suite against PostgreSQL, manually test critical features, compare query results between SQLite and PostgreSQL versions, verify data integrity (row counts, sample records, relationship preservation), and test performance under realistic load. Only after validation passes do you deploy to production. Skipping validation means discovering bugs in production when users are affected.

Why the strict order matters: you can't migrate data (Phase 3) before creating the destination schema (Phase 2). You can't validate (Phase 4) before migrating (Phase 3). You shouldn't start any phase without completing preparation (Phase 1). This sequential workflow prevents the chaos of ad-hoc migrations where you're simultaneously debugging schema issues, data conversions, and code changes without clear separation. Professional migrations follow this workflow because it minimises risk and enables rollback at each stage.

What is TIMESTAMP WITH TIME ZONE and why is it superior to storing timestamps as TEXT?

TIMESTAMP WITH TIME ZONE (often abbreviated TIMESTAMPTZ) is PostgreSQL's timezone-aware timestamp type. It stores the absolute point in time (internally as UTC) while preserving the timezone context. When you query the data, PostgreSQL converts it to your session's timezone automatically. This prevents the timezone bugs that plague applications storing timestamps as TEXT strings.

The TEXT timestamp problem: SQLite stores timestamps as TEXT like "2024-12-09 14:30:00". But what timezone is that? Eastern? Pacific? UTC? If you save "14:30:00" in Los Angeles and read it in New York, did the event happen at 2:30pm Eastern (same absolute time) or 11:30am Eastern (wrong time)? With TEXT timestamps, you have no way to know. Worse, daylight saving time changes break everything. "2024-03-10 02:30:00" doesn't exist in US timezones because clocks spring forward at 2am.

TIMESTAMPTZ advantages:

Unambiguous storage. PostgreSQL stores the absolute UTC time internally. "2024-12-09 14:30:00-08" (Pacific) and "2024-12-09 17:30:00-05" (Eastern) are the same moment. PostgreSQL knows this.
Automatic conversion. Query from any timezone and PostgreSQL converts to your local time automatically. No Python datetime gymnastics needed.
DST handling. PostgreSQL understands daylight saving rules. It knows "2024-03-10 02:30:00" is invalid and handles it appropriately.
Arithmetic. You can add intervals (INTERVAL '2 hours') and PostgreSQL handles timezone transitions correctly. With TEXT, you must parse to datetime, add, then serialise back.

Migration conversion: your SQLite TEXT timestamps need timezone context. If they were stored in UTC (best case), parse them as UTC during migration. If they were local times without timezone info (common), you must decide: treat them as UTC (probably wrong), treat them as server timezone (fragile if server moves), or accept data ambiguity. This is why starting with TIMESTAMPTZ from day one prevents migration headaches.

Explain connection pooling: what problem does it solve and how do minconn and maxconn work?

The problem: opening a PostgreSQL connection is expensive. Typically 20-40 milliseconds for network handshake, authentication, and server resource allocation. If every HTTP request opens a connection, executes one query (5ms), then closes the connection, you're spending 80% of request time on connection overhead. For 100 requests per minute, you create and destroy 100 connections. This wastes time, overwhelms the database with connection requests, and exhausts connection limits (PostgreSQL defaults to 100 max connections).

The solution: connection pooling maintains a pool of open, authenticated connections. When a request needs database access, it borrows a connection from the pool, executes queries, then returns the connection (doesn't close it). The next request reuses that connection immediately. Zero connection overhead. Instead of 100 connection create/destroy cycles per minute, you have five to ten persistent connections serving hundreds of requests.

minconn (minimum connections). The pool maintains at least this many open connections at all times, even during idle periods. With minconn=2, two connections are opened when your application starts and stay open until shutdown. When the first request arrives, a connection is immediately available. No 30ms connection overhead. The request executes in 5ms total instead of 35ms. minconn=2 is appropriate for small applications. Larger applications might use minconn=5 or minconn=10 to ensure connections are always available.

maxconn (maximum connections). The pool creates at most this many connections, preventing your application from overwhelming the database. With maxconn=10, if all 10 connections are currently in use and an 11th request arrives, it waits in a queue until a connection becomes available. This prevents scenarios where 100 simultaneous requests try to open 100 connections, hitting PostgreSQL's connection limit and crashing the database server. maxconn=10 is appropriate for three to five web workers. Scale it with your worker count: three workers × five max connections per worker = 15 total max connections.

Pool behaviour under load: (1) Application starts: pool creates minconn=2 connections immediately. (2) First request arrives: borrows connection 1, executes query, returns connection 1 to pool. (3) Spike of 8 simultaneous requests: pool creates 6 more connections (up to maxconn=10), all 8 requests execute concurrently. (4) Load drops: 8 connections sit idle in pool (already authenticated, ready). (5) Next request: reuses an existing connection instantly.

What is Alembic and how does it improve team coordination for schema changes?

Alembic is a database migration tool for SQLAlchemy that version-controls schema changes. Instead of ad-hoc SQL scripts that developers run manually (hoping they remember), Alembic tracks which migrations have run, applies new migrations automatically, and enables safe rollbacks when deployments fail. Think of it as Git for your database schema. Every change is a commit (migration), changes are applied in order, and you can roll back to previous versions.

The manual SQL problem: without Alembic, schema changes are chaotic. Developer A adds a column locally, sends teammates a SQL file via Slack: "Hey everyone run this: ALTER TABLE tracks ADD COLUMN genre VARCHAR(100);". Developer B runs it. Developer C forgets. Developer D runs it twice (error). Production database is behind because no one remembers to run it during deployment. A week later, the app crashes in production because it expects a genre column that doesn't exist. No one knows which schema changes have run where.

How Alembic solves this: Alembic stores an alembic_version table in your database with one row indicating the current migration version (such as abc123). When you run alembic upgrade head, Alembic checks the current version, finds all migration files more recent than that version, applies them in order, and updates alembic_version to the latest migration. This happens automatically. Developers don't need to remember which migrations they've run. The database itself tracks this.

Team coordination benefits:

Developer onboarding. New developer clones the repo and runs alembic upgrade head. Alembic applies all migrations in correct order, creating a local database matching production schema. No manual SQL needed.
Deployment automation. Your CI/CD pipeline runs alembic upgrade head before deploying new code. If migrations fail, deployment halts (preventing broken production). If migrations succeed, the database schema matches the new code.
Rollback capability. Deploy fails? Run alembic downgrade -1 to undo the last migration, then roll back code. Database and code stay synchronised.

Migration workflow example: Developer A adds a genre column. She runs alembic revision -m "add genre column", creating abc123_add_genre_column.py with upgrade() and downgrade() functions. She commits this file to Git. Developer B pulls the repo and runs alembic upgrade head. Alembic sees abc123 hasn't run yet, runs the upgrade() function, updates version to abc123. Developer C repeats identically. Production deployment runs alembic upgrade head automatically. Everyone's database reaches the same state without manual coordination.

Looking forward

You've mastered database migration from SQLite to PostgreSQL. You know when migration provides value, how to plan and execute it safely, and how to use PostgreSQL's features -- proper timestamp handling, connection pooling, Alembic for schema evolution -- to back a real application. These skills separate hobby projects from production systems.

In Chapter 25: Migrating the Music Time Machine to PostgreSQL, you'll apply these exact patterns to a significantly more complex application: the OAuth-enabled Flask application you built in Chapters 14-18. You'll migrate two related tables with a foreign key, convert the cached track payload into queryable JSONB, add full-text search across track metadata, and deploy with connection pooling tuned for real traffic.

The progression is deliberate: simple example first (weather cache), then real-world complexity (Music Time Machine). By the end of Chapter 25, your Music Time Machine will run on PostgreSQL, handle hundreds of concurrent users without "database is locked" errors, and demonstrate production-grade database patterns. You'll have concrete interview material: a three-table OAuth application migrated from SQLite to PostgreSQL with zero data loss, JSONB for queryable JSON data, full-text search with GIN indexes, and connection pooling. That's the kind of specific, technical accomplishment that proves production experience.