4. CloudWatch monitoring and observability

You can't fix what you can't see. CloudWatch aggregates logs, exposes metrics, fires alarms, and renders dashboards. This page wires the API into all four.

The Chapter 28 task definition already routes container stdout/stderr to CloudWatch Logs, so the raw telemetry stream exists. What this section adds is the structure on top: a dashboard organised around the four metrics that actually predict user-visible failure, three alarms that fire on conditions worth paging for, and the Logs Insights queries that let you answer "what happened at 14:30?" without SSHing anywhere.

The golden signals framework

Modern infrastructure exposes thousands of metrics; staring at all of them is the same as staring at none. The golden signals are the four-metric default Google's SRE team converged on for any user-facing system. They cover most of the failure surface, and a deviation in one of them is almost always the right thing to investigate first.

  • Latency. How long requests take. Always track percentiles, not averages: if 95% of requests take 100ms and 5% take 10 seconds, the average is still under a second while one user in twenty is in pain. P50/P90/P99 is the standard trio. A jump in P99 means one request in a hundred is slow, which is usually the first signal something downstream (a query, a cache miss, an external API) is degrading.
  • Traffic. Requests per second through the ALB. The interesting signal is rate-of-change, not the absolute number: an unexpected drop usually means clients are giving up or a routing layer is misconfigured; a sudden spike could be a marketing event or a misbehaving bot. Capacity decisions in section 5 use this metric directly.
  • Errors. 4xx and 5xx counts from the ALB, split. 4xx is bad input from clients (a missing API key, an unknown route) and stays roughly flat under normal load. 5xx is the server failing, and a sudden 5xx rise is almost always a real incident that needs eyes on it.
  • Saturation. How close each resource is to running out: CPU and memory on the ECS tasks, connection-pool utilisation on RDS, evictions on ElastiCache. Saturation is the leading indicator the other three are about to move: latency spikes and 5xx counts often start a minute or two after the CPU graph crosses 80%.

Creating a CloudWatch dashboard

CloudWatch Dashboards provide visual representation of your metrics. You'll create a dashboard tracking the Golden Signals for your News API, giving you real-time visibility into production health.

Create the dashboard:

Terminal
aws cloudwatch put-dashboard \
    --dashboard-name NewsAPIProductionHealth \
    --dashboard-body file://dashboard-config.json

Dashboard configuration: Create dashboard-config.json defining your Golden Signals widgets. Replace YOUR-ALB-ARN-SUFFIX with your actual ALB identifier:

dashboard-config.json
{
  "widgets": [
    {
      "type": "metric",
      "x": 0,
      "y": 0,
      "width": 12,
      "height": 6,
      "properties": {
        "metrics": [
          ["AWS/ApplicationELB", "TargetResponseTime",
           "LoadBalancer", "YOUR-ALB-ARN-SUFFIX",
           {"stat": "Average", "label": "Average Latency", "yAxis": "left"}],
          ["...", {"stat": "p90", "label": "P90 Latency", "yAxis": "left"}],
          ["...", {"stat": "p99", "label": "P99 Latency", "yAxis": "left"}]
        ],
        "view": "timeSeries",
        "stacked": false,
        "region": "us-east-1",
        "title": "Latency (Response Time)",
        "period": 60,
        "yAxis": {
          "left": {
            "min": 0,
            "label": "Seconds"
          }
        }
      }
    },
    {
      "type": "metric",
      "x": 12,
      "y": 0,
      "width": 12,
      "height": 6,
      "properties": {
        "metrics": [
          ["AWS/ApplicationELB", "RequestCount",
           "LoadBalancer", "YOUR-ALB-ARN-SUFFIX",
           {"stat": "Sum", "label": "Total Requests"}]
        ],
        "view": "timeSeries",
        "stacked": false,
        "region": "us-east-1",
        "title": "Traffic (Requests Per Minute)",
        "period": 60,
        "yAxis": {
          "left": {
            "min": 0,
            "label": "Requests"
          }
        }
      }
    },
    {
      "type": "metric",
      "x": 0,
      "y": 6,
      "width": 12,
      "height": 6,
      "properties": {
        "metrics": [
          ["AWS/ApplicationELB", "HTTPCode_Target_5XX_Count",
           "LoadBalancer", "YOUR-ALB-ARN-SUFFIX",
           {"stat": "Sum", "label": "5XX Errors", "color": "#d62728"}],
          [".", "HTTPCode_Target_4XX_Count",
           "LoadBalancer", "YOUR-ALB-ARN-SUFFIX",
           {"stat": "Sum", "label": "4XX Errors", "color": "#ff7f0e"}]
        ],
        "view": "timeSeries",
        "stacked": false,
        "region": "us-east-1",
        "title": "Errors (By Status Code)",
        "period": 60,
        "yAxis": {
          "left": {
            "min": 0,
            "label": "Error Count"
          }
        }
      }
    },
    {
      "type": "metric",
      "x": 12,
      "y": 6,
      "width": 12,
      "height": 6,
      "properties": {
        "metrics": [
          ["AWS/ECS", "CPUUtilization", 
           {"stat": "Average", "label": "CPU Usage", 
            "dimensions": {"ServiceName": "news-api-service", "ClusterName": "news-api-cluster"}}],
          [".", "MemoryUtilization", 
           {"stat": "Average", "label": "Memory Usage",
            "dimensions": {"ServiceName": "news-api-service", "ClusterName": "news-api-cluster"}}]
        ],
        "view": "timeSeries",
        "stacked": false,
        "region": "us-east-1",
        "title": "Saturation (Resource Utilization)",
        "period": 60,
        "yAxis": {
          "left": {
            "min": 0,
            "max": 100,
            "label": "Percentage"
          }
        }
      }
    }
  ]
}

Find your ALB ARN suffix:

Terminal
# Get your ALB ARN
aws elbv2 describe-load-balancers \
    --names news-api-alb \
    --query 'LoadBalancers[0].LoadBalancerArn' \
    --output text

# Output looks like: arn:aws:elasticloadbalancing:us-east-1:123456789:loadbalancer/app/news-api-alb/abc123def456
# The suffix you need is: app/news-api-alb/abc123def456

Note that the ALB metrics in this configuration apply to all traffic through the load balancer. If you want metrics specific to your target group, you can add TargetGroup dimensions, but for most monitoring purposes, load balancer-level metrics provide the visibility you need.

Verifying the dashboard works

After creating your dashboard, verify it's collecting metrics correctly:

  1. Open CloudWatch -> Dashboards -> NewsAPIProductionHealth
  2. Set time range to "Last 1 hour"
  3. Generate some traffic to your API with curl or browser
  4. Refresh dashboard after 1-2 minutes

What healthy metrics look like:

  • Latency: Average 50-200ms, P90 100-400ms, P99 200-800ms. Flat lines indicate consistent performance. Spikes indicate slow queries or external API delays.
  • Traffic: Varies by usage. Even small numbers (5-10 requests/minute) during testing confirm the dashboard works. Production might show 100-1000+ requests/minute.
  • Errors: Should be zero or near-zero most of the time. Occasional 4xx errors (user mistakes like bad parameters) are normal. 5xx errors indicate your application failing.
  • Saturation: CPU 20-40% is healthy with headroom. Memory 40-60% is typical. Both consistently at 80%+ indicates you need more capacity or have a performance problem.

Flat lines with no data: Dashboard is configured correctly but no traffic is flowing. Generate requests to your API.

Saw-tooth pattern in CPU: Normal if you have auto-scaling enabled. CPU rises, scaling adds tasks, CPU drops as load distributes.

Latency spikes correlating with traffic spikes: Your system struggles under load. Consider adding caching, optimizing slow queries, or enabling auto-scaling.

Sudden error rate jump from 0% to 10%+: Production incident. Use CloudWatch Logs Insights to investigate what changed, recent deployment, database issue, external API problem.

Creating meaningful alarms

A dashboard is passive; an alarm is the thing that puts the page on someone's phone. The rule of thumb to keep an alarm worth its own existence: there should be a runbook attached, and the threshold should be high enough that the on-call's first action isn't "this is noise." Three alarms below cover the chapter's failure modes: 5xx error count, P99 latency, and CPU saturation.

Create an alarm for high error rates:

Terminal
aws cloudwatch put-metric-alarm \
    --alarm-name news-api-high-5xx-errors \
    --alarm-description "Alert when 5XX error count exceeds 10 in 5 minutes" \
    --metric-name HTTPCode_Target_5XX_Count \
    --namespace AWS/ApplicationELB \
    --dimensions Name=LoadBalancer,Value=YOUR-ALB-ARN-SUFFIX \
    --statistic Sum \
    --period 300 \
    --threshold 10 \
    --comparison-operator GreaterThanThreshold \
    --evaluation-periods 1 \
    --treat-missing-data notBreaching

This alarm triggers if your API returns more than 10 5xx errors in a 5-minute window. The evaluation-periods 1 means the threshold must be crossed in one period (5 minutes) to trigger. The treat-missing-data notBreaching means no errors (missing data) doesn't trigger alarms.

Create an alarm for high latency:

Terminal
aws cloudwatch put-metric-alarm \
    --alarm-name news-api-high-p99-latency \
    --alarm-description "Alert when P99 latency exceeds 1000ms" \
    --metric-name TargetResponseTime \
    --namespace AWS/ApplicationELB \
    --dimensions Name=LoadBalancer,Value=YOUR-ALB-ARN-SUFFIX \
    --statistic p99 \
    --period 300 \
    --threshold 1.0 \
    --comparison-operator GreaterThanThreshold \
    --evaluation-periods 2

This alarm triggers if p99 latency exceeds 1 second for two consecutive 5-minute periods (10 minutes total). The double evaluation period prevents transient spikes from triggering alarms, only sustained latency problems trigger.

Create an alarm for high CPU saturation:

Terminal
aws cloudwatch put-metric-alarm \
    --alarm-name news-api-high-cpu \
    --alarm-description "Alert when CPU exceeds 85% for 10 minutes" \
    --metric-name CPUUtilization \
    --namespace AWS/ECS \
    --dimensions Name=ServiceName,Value=news-api-service \
                 Name=ClusterName,Value=news-api-cluster \
    --statistic Average \
    --period 300 \
    --threshold 85 \
    --comparison-operator GreaterThanThreshold \
    --evaluation-periods 2

This alarm triggers if average CPU utilization exceeds 85% for two consecutive 5-minute periods. High CPU indicates you're nearing capacity, either traffic increased beyond expectations or application performance degraded. This alarm gives you early warning to investigate before CPU hits 100% and requests start timing out.

Preventing alert fatigue

Alarms that fire on normal variation train the on-call to ignore the page. The usual causes are thresholds set too low, a single-period evaluation that catches transient spikes, and alarms for which there's no documented response. The discipline that prevents fatigue is simple: every alarm has a runbook (even a one-line one), and any alarm whose runbook is "ignore unless it keeps firing" gets deleted, because it's creating noise rather than signal.

Querying logs with CloudWatch Logs Insights

Your containers write logs to CloudWatch Logs. Reading logs line-by-line works for simple debugging, but production systems generate thousands of log lines per minute across multiple containers. CloudWatch Logs Insights provides a query language for analyzing logs at scale.

Find all 500 errors in the last hour:

CloudWatch Logs Insights
fields @timestamp, @message
| filter @message like /500/
| filter @message like /ERROR/
| sort @timestamp desc
| limit 50

This query finds log lines containing "500" and "ERROR", sorts by timestamp descending (newest first), and limits results to 50 lines. Run this in CloudWatch -> Logs -> Insights, select your log group (/ecs/news-api), and execute.

Count requests by endpoint:

CloudWatch Logs Insights
fields @timestamp, @message
| filter @message like /GET/
| parse @message /GET (?\/[^ ]*)/
| stats count() by endpoint
| sort count desc

This query extracts endpoints from GET request logs using regex parsing, counts requests per endpoint, and sorts by count. This helps identify your busiest endpoints for optimization efforts.

Find slow database queries:

CloudWatch Logs Insights
fields @timestamp, @message
| filter @message like /database query took/
| parse @message /query took (?\d+)ms/
| filter duration > 1000
| sort duration desc
| limit 20

This query finds database queries taking longer than 1 second, assuming the application logs query duration in that format. The output is what feeds the next decision: an index, a query rewrite, or a cache layer in front of the call.

Logs Insights queries are the third leg of monitoring (after dashboards and alarms). When an alarm fires, the runbook usually says "open Insights, run this query, look for X." The query language takes a session or two to get fluent in; once it does, it replaces SSH-and-tail-the-logs entirely.

Next, in section 5, we turn the ECS service from a fixed two-task deployment into one that auto-scales on CPU utilisation, and load-test the policy to verify it actually adds and removes tasks the way the target-tracking config claims.