6. ECS Fargate: deploying containerised apps

Fargate runs containers without you managing servers. Define a task, create a service, and AWS handles the rest.

The image is in ECR; RDS and ElastiCache are ready to accept connections. ECS is the piece that takes the image and runs it: it schedules containers, restarts the ones that crash, rolls new versions out in place of old ones, and ties the running tasks to the load balancer in section 7.

ECS has two launch types. EC2 means you also run the underlying virtual machines yourself. Fargate means you don't: you tell AWS "run this container with 0.25 vCPU and 512MB" and it picks the host, patches the kernel, and reschedules the container if the host disappears. The whole chapter uses Fargate, which keeps the surface area limited to task definitions and services.

ECS concepts: clusters, tasks, services

ECS has three core concepts that work together to run your containers:

  • Cluster. A namespace that groups tasks and services. The chapter uses one cluster, news-api-cluster; production teams typically have one per environment (dev / staging / prod) or one per application. Clusters themselves are free; you pay for what runs in them.
  • Task definition. The blueprint for one running container: image URI, CPU and memory, port mappings, environment variables, log driver. Each update creates a new revision (news-api-task:1, :2, ...) so rolling back means pointing the service at an older revision.
  • Task. One running instance of a task definition. Tasks are disposable; if one crashes, ECS starts a replacement. Don't store anything important inside one.
  • Service. The control loop that keeps a target number of tasks running. Set desired-count to 2 and the service starts a replacement whenever one of the two tasks goes away. The service is also what registers tasks with the load balancer's target group later in the chapter.

The order of operations: create the cluster, register the task definition, create a service inside the cluster that references the task definition with desired-count: 2. ECS pulls the image from ECR, starts two tasks, watches them, and replaces any that die.

Creating the ECS cluster

On Fargate the cluster is just a namespace. There's nothing to size and nothing to keep running between deploys.

Make: Create an ECS cluster:

Terminal
aws ecs create-cluster \
    --cluster-name news-api-cluster \
    --region us-east-1

# Output:
{
    "cluster": {
        "clusterArn": "arn:aws:ecs:us-east-1:123456789012:cluster/news-api-cluster",
        "clusterName": "news-api-cluster",
        "status": "ACTIVE",
        "registeredContainerInstancesCount": 0,
        "runningTasksCount": 0,
        "pendingTasksCount": 0
    }
}

Check: Verify cluster creation:

Terminal
aws ecs list-clusters --region us-east-1

# Shows your news-api-cluster

The cluster is ready. Now you'll create the task definition that specifies how to run your News API container.

Creating the task definition

Task definitions are JSON documents specifying container configuration. You'll create a task definition for your News API with:

  • Your ECR image URI from Section 4
  • CPU allocation: 256 units (.25 vCPU)
  • Memory allocation: 512 MB
  • Environment variables for database connections and API keys
  • Port mapping: container port 8000 to host port 8000
  • CloudWatch Logs configuration for centralized logging

Make: Create a task definition JSON file:

task-definition.json
{
  "family": "news-api-task",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "256",
  "memory": "512",
  "executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole",
  "containerDefinitions": [
    {
      "name": "news-api",
      "image": "123456789012.dkr.ecr.us-east-1.amazonaws.com/news-aggregator-api:latest",
      "portMappings": [
        {
          "containerPort": 8000,
          "protocol": "tcp"
        }
      ],
      "environment": [
        {
          "name": "DATABASE_URL",
          "value": "postgresql://newsadmin:YourSecurePassword123!@news-api-db.c9z8v7x2y3z4.us-east-1.rds.amazonaws.com:5432/postgres"
        },
        {
          "name": "REDIS_URL",
          "value": "redis://news-api-cache.abc123.0001.use1.cache.amazonaws.com:6379"
        },
        {
          "name": "NEWSAPI_KEY",
          "value": "your-newsapi-key-here"
        },
        {
          "name": "GUARDIAN_KEY",
          "value": "your-guardian-key-here"
        },
        {
          "name": "ADMIN_API_KEY",
          "value": "your-long-random-admin-key"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/news-api",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "ecs"
        }
      }
    }
  ]
}

Key configuration explained:

family: Task definition name. Updates to this task definition create new revisions (news-api-task:1, news-api-task:2, etc.).

networkMode: awsvpc: Each task gets its own elastic network interface with a private IP address. This is required for Fargate and provides task-level network isolation.

cpu and memory: Fargate has specific valid combinations. 256 CPU (.25 vCPU) with 512MB memory is Free Tier eligible and sufficient for light-to-moderate API traffic. See AWS documentation for other valid combinations.

executionRoleArn: IAM role that grants ECS permission to pull images from ECR and write logs to CloudWatch. AWS does not create this role automatically, so you create it once just before registering the task definition (below) and reuse it for every task afterwards.

environment variables: Your application reads these at runtime. Replace placeholder values with your actual RDS endpoint, ElastiCache endpoint, upstream API keys, and the ADMIN_API_KEY the key-management endpoints check (covered in Chapter 26).

logConfiguration: Sends container stdout/stderr to CloudWatch Logs. You'll query these logs with aws logs tail later in this section when you debug the running service, and Chapter 29 builds CloudWatch dashboards and alarms on top of them.

Before registering the task definition, create the CloudWatch log group:

Terminal
aws logs create-log-group \
    --log-group-name /ecs/news-api \
    --region us-east-1

Then create the task execution role. This is the role named in executionRoleArn above; it lets ECS pull your image from ECR and write logs to CloudWatch on your behalf. First define a trust policy so ECS tasks are allowed to assume it:

ecs-trust-policy.json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": { "Service": "ecs-tasks.amazonaws.com" },
      "Action": "sts:AssumeRole"
    }
  ]
}

Create the role and attach the AWS-managed execution policy, which already grants the ECR pull and CloudWatch Logs permissions ECS needs at task start:

Terminal
aws iam create-role \
    --role-name ecsTaskExecutionRole \
    --assume-role-policy-document file://ecs-trust-policy.json

aws iam attach-role-policy \
    --role-name ecsTaskExecutionRole \
    --policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy

# create-role prints the new role's ARN, ending in :role/ecsTaskExecutionRole

That printed ARN is exactly what executionRoleArn in task-definition.json points to, with your own account ID in place of the 123456789012 placeholder. If you have run an ECS task in this account before, the role may already exist; create-role then reports EntityAlreadyExists and you can move on without changing anything.

Register the task definition:

Terminal
aws ecs register-task-definition \
    --cli-input-json file://task-definition.json \
    --region us-east-1

# Output shows task definition registered as revision 1

Check: List task definitions:

Terminal
aws ecs list-task-definitions --region us-east-1

# Shows: news-api-task:1
Secrets in environment variables

The example above embeds the RDS password and API keys directly in environment for readability. In production, move them to secrets instead, with the value being a Systems Manager Parameter Store ARN. Anything in environment shows up in the output of aws ecs describe-task-definition, which is visible to anyone with ECS read permission; secrets only exposes the ARN, and ECS resolves the value at task start using the task execution role.

Creating an ECS service

The task definition describes what to run. The service describes how many to run and where to run them. You'll create a service that maintains 2 tasks (containers) running continuously in your cluster.

Make: Create an ECS service:

Terminal
# Get default VPC and subnet IDs
VPC_ID=$(aws ec2 describe-vpcs \
    --filters "Name=isDefault,Values=true" \
    --query 'Vpcs[0].VpcId' \
    --output text)

SUBNET_IDS=$(aws ec2 describe-subnets \
    --filters "Name=vpc-id,Values=$VPC_ID" \
    --query 'Subnets[*].SubnetId' \
    --output text | tr '\t' ',')

# Create security group for ECS tasks
ECS_SG=$(aws ec2 create-security-group \
    --group-name news-api-ecs-sg \
    --description "Security group for News API ECS tasks" \
    --vpc-id $VPC_ID \
    --query 'GroupId' \
    --output text)

# Allow outbound traffic (required for pulling ECR images, accessing RDS/Redis)
aws ec2 authorize-security-group-egress \
    --group-id $ECS_SG \
    --protocol -1 \
    --cidr 0.0.0.0/0

# Create the service
aws ecs create-service \
    --cluster news-api-cluster \
    --service-name news-api-service \
    --task-definition news-api-task:1 \
    --desired-count 2 \
    --launch-type FARGATE \
    --network-configuration "awsvpcConfiguration={subnets=[$SUBNET_IDS],securityGroups=[$ECS_SG],assignPublicIp=ENABLED}" \
    --region us-east-1

What this does:

desired-count: 2: ECS maintains 2 running tasks at all times. If a task fails, ECS starts a replacement immediately. That's the failover: if one container crashes, the other keeps serving requests while the replacement spins up.

launch-type: FARGATE: Use serverless containers. No server management required.

network-configuration: Tasks run in your default VPC subnets. Each task gets a private IP address from the subnet range. assignPublicIp=ENABLED gives tasks public IP addresses so they can pull images from ECR (ECR requires internet access or VPC endpoints).

Check: Monitor service deployment:

Terminal
aws ecs describe-services \
    --cluster news-api-cluster \
    --services news-api-service \
    --query 'services[0].{Status:status,Running:runningCount,Desired:desiredCount}' \
    --output table

# Output shows deployment progress:
# Status: ACTIVE
# Running: 2
# Desired: 2

When runningCount equals desiredCount, your containers are running. This typically takes 2-3 minutes for initial deployment (pulling image, starting containers, running health checks).

View container logs in CloudWatch:

Terminal
aws logs tail /ecs/news-api --follow --region us-east-1

# Shows real-time container logs:
# INFO:     Uvicorn running on http://0.0.0.0:8000
# INFO:     Application startup complete
# INFO:     Connected to PostgreSQL database
# INFO:     Connected to Redis cache

The News API is now running on AWS. The containers are deployed, connected to RDS and ElastiCache, and logging to CloudWatch. They aren't publicly reachable yet: the tasks have private IPs inside the VPC and no stable hostname. Section 7 adds the Application Load Balancer that fronts the service with a public DNS name and an HTTPS listener.

Tightening the database security groups

Section 5 opened RDS and ElastiCache to the whole VPC CIDR as a placeholder. Now that the ECS security group exists, replace those broad rules with rules whose source is the ECS security group itself, so only tasks in this service can reach the databases:

Terminal
# Re-derive the Section 5 variables if you're in a fresh shell
RDS_SG=$(aws rds describe-db-instances \
    --db-instance-identifier news-api-db \
    --query 'DBInstances[0].VpcSecurityGroups[0].VpcSecurityGroupId' \
    --output text)

REDIS_SG=$(aws elasticache describe-cache-clusters \
    --cache-cluster-id news-api-cache \
    --show-cache-node-info \
    --query 'CacheClusters[0].SecurityGroups[0].SecurityGroupId' \
    --output text)

VPC_CIDR=$(aws ec2 describe-vpcs \
    --filters "Name=isDefault,Values=true" \
    --query 'Vpcs[0].CidrBlock' \
    --output text)

# Allow Postgres and Redis only from the ECS tasks' security group
aws ec2 authorize-security-group-ingress \
    --group-id $RDS_SG \
    --protocol tcp \
    --port 5432 \
    --source-group $ECS_SG

aws ec2 authorize-security-group-ingress \
    --group-id $REDIS_SG \
    --protocol tcp \
    --port 6379 \
    --source-group $ECS_SG

# Remove the broad VPC-wide rules from Section 5
aws ec2 revoke-security-group-ingress \
    --group-id $RDS_SG \
    --protocol tcp \
    --port 5432 \
    --cidr $VPC_CIDR

aws ec2 revoke-security-group-ingress \
    --group-id $REDIS_SG \
    --protocol tcp \
    --port 6379 \
    --cidr $VPC_CIDR

The running tasks don't notice the swap: security groups are evaluated per connection, and the new source-group rules cover the same traffic the CIDR rules did, minus everything else in the VPC. This is the least-privilege shape the chapter review summarises: each tier accepts connections only from the tier directly in front of it.

Verifying the container deployment

Your containers are running but only accessible within your VPC. To verify they're working before adding the load balancer, you have two options: create a bastion host (EC2 instance in the same VPC for SSH access), or temporarily add an inbound rule to the ECS security group allowing your home IP address.

Quick verification (temporary public access):

Terminal
# Get your public IP
MY_IP=$(curl -s https://checkip.amazonaws.com)

# Allow HTTP access from your IP temporarily
aws ec2 authorize-security-group-ingress \
    --group-id $ECS_SG \
    --protocol tcp \
    --port 8000 \
    --cidr $MY_IP/32

# Get a task's public IP
TASK_IP=$(aws ecs list-tasks \
    --cluster news-api-cluster \
    --service-name news-api-service \
    --query 'taskArns[0]' \
    --output text | xargs -I {} aws ecs describe-tasks \
    --cluster news-api-cluster \
    --tasks {} \
    --query 'tasks[0].attachments[0].details[?name==`networkInterfaceId`].value' \
    --output text | xargs -I {} aws ec2 describe-network-interfaces \
    --network-interface-ids {} \
    --query 'NetworkInterfaces[0].Association.PublicIp' \
    --output text)

echo "Task public IP: $TASK_IP"

# Test the API
curl http://$TASK_IP:8000/docs

# Remove the temporary rule after testing
aws ec2 revoke-security-group-ingress \
    --group-id $ECS_SG \
    --protocol tcp \
    --port 8000 \
    --cidr $MY_IP/32

If you see the FastAPI documentation page at http://TASK_IP:8000/docs, your containers are working correctly. They're connected to RDS and ElastiCache, serving requests, and ready for production traffic through the load balancer you'll create in Section 7.

Leaving the tasks publicly reachable on port 8000 isn't a viable long-term shape. The IPs change every time a task restarts, so there's no stable URL to give a client. There's no TLS, no shared health-check, and nothing routing traffic across the two tasks. The next section puts an Application Load Balancer in front, which is what supplies the stable hostname, the HTTPS listener, and the target-group health checks that decide which tasks get traffic.

Next, in section 7, we create the ALB target group, register the ECS service against it, request a free ACM certificate, and attach the HTTPS listener so the deployment finally has a real public URL.