Chapter 11: News Aggregator Case Study

1. Multi-API integration in practice

In this chapter you'll integrate three real news APIs (NewsAPI, the Guardian, and HackerNews) into one CLI that hides their wildly different shapes behind a single canonical Article model. By the end you'll have one normalizer per source, an aggregator that handles partial failures and deduplicates across sources, and a CLI that ships in a few hundred lines because Chapter 10's toolkit does the heavy lifting.

Chapter 10 built that toolkit: explore_api_structure, safe_get, try_fields, and extract_items_and_meta, all sitting in your api_helpers.py by the end of that chapter. This chapter is the case study where it all gets exercised at once.

Suppose you want to stay current with Python programming news. NewsAPI, The Guardian, and HackerNews each cover the same conceptual space (articles, authors, timestamps, links) from three different angles, and each one ships its data in its own shape with its own field names and its own failure modes. Querying all three means three separate integrations, three different data structures, and three different error patterns to manage.

The architecture this chapter builds. Three sources, three shapes, one canonical feed.

This isn't unique to news aggregation. Your e-commerce platform needs both Stripe and PayPal. Your analytics dashboard combines Google Analytics with Mixpanel. Your authentication system supports GitHub, Google, and Microsoft. Production systems constantly face this challenge: multiple external services, each with its own response format, failure modes, and quirks. Your application still needs unified, consistent data.

What success looks like

By the end of the chapter you'll have a CLI that runs like this:

Terminal

$ python news_aggregator.py
======================================================================
NEWS AGGREGATOR
======================================================================

Search news from NewsAPI, The Guardian, and HackerNews

Commands:
  search <query>  - search for news articles
  again           - re-display the last search results
  group           - show last results grouped by source
  help            - show this help message
  quit            - exit application
======================================================================

aggregator> search python programming

Searching for 'python programming'...

NewsAPI: 5 of 12847 matching articles
Guardian: 5 of 8453 matching articles
HackerNews: 5 of 583201 matching articles

======================================================================
RESULTS
======================================================================

Query: 'python programming'
Sources: NewsAPI, Guardian, HackerNews
Articles: 12 (3 duplicates removed)

1. Python 3.13 Performance Improvements
   TechCrunch | January 15, 2025 at 02:30 PM
   Sarah Chen
   Python's latest release brings significant speed improvements...
   https://techcrunch.com/2025/01/15/python-3-13-performance

[entries 2-10 omitted from this excerpt; the CLI renders all ten]

... and 2 more articles

======================================================================

One command queries three APIs. All sources respond. Fifteen articles became twelve after deduplication. Results display in one consistent format whether they came from NewsAPI, Guardian, or HackerNews. The complexity is hidden from the caller.

Why this is hard

The three services represent the same concept (a news article) in completely different ways. NewsAPI wraps articles in an articles array with nested source objects. Guardian buries them in response.results using field names like webTitle and webPublicationDate. HackerNews uses hits with Unix timestamps and minimal metadata. Same business concept, three incompatible structures.

Without a systematic approach, your codebase becomes a mess of conditionals. Display logic needs branching: "if NewsAPI show publishedAt, if Guardian show webPublicationDate, if HackerNews convert Unix timestamp." Every operation (storage, sorting, deduplication, filtering) requires source-specific handling. Adding a fourth source means updating dozens of locations throughout your code.

Production systems isolate API differences at the boundary. You normalize external formats into one internal representation once, then everything downstream works with consistent structure. Display code never knows or cares which API provided the data. That separation is what keeps systems maintainable as APIs evolve and requirements change.

What you'll learn

Recognise when a project crosses the threshold from single-API to multi-API and the canonical-model pattern earns its complexity
Map any API's response shape with Chapter 10's explore_api_structure before writing extraction code
Design a canonical model that absorbs container, naming, and optionality differences across sources
Write source-specific normalizers that import Chapter 10's safe_get / try_fields / extract_items_and_meta rather than redefining them
Build an aggregation pipeline that survives partial failures and deduplicates across sources
Recognise where defensive programming stops being enough and validation must take over

What you'll build

models.py -- the Article dataclass with required and optional fields, plus post-init validation
normalize_newsapi.py -- turns NewsAPI's articles array into Article instances
normalize_guardian.py -- handles Guardian's response.results wrapper and the webTitle / webPublicationDate rename
normalize_hackernews.py -- handles HackerNews's hits and the dual ISO/Unix timestamp pair
aggregator.py -- the NewsAggregator class that fetches, normalizes, deduplicates, and sorts unified results
display.py -- the formatter that turns Article objects into the unified CLI output, with no source-specific branching
news_aggregator.py -- the CLI that turns the aggregator into a searchable command-line tool

Prerequisites

This chapter assumes you have api_helpers.py from Chapter 10 on your import path. Every Python block here that uses safe_get, try_fields, or extract_items_and_meta imports them by name -- the chapter does not redefine them. If you skipped Chapter 10, run through it first. The toolkit takes 30 minutes to build and saves you days here.

You'll also need python-dotenv installed so the examples can load credentials from your project .env file, following the Chapter 7 pattern. If it is not already installed in this project, run pip install python-dotenv.

You'll need free API keys for NewsAPI (500 requests/day) and the Guardian (5,000 requests/day). HackerNews requires no authentication. The next page provides registration URLs and setup instructions.