Chapter 11: News Aggregator Case Study
1. Multi-API integration in practice
In this chapter you'll integrate three real news APIs (NewsAPI, the Guardian, and HackerNews) into one CLI that hides their wildly different shapes behind a single canonical Article model. By the end you'll have one normalizer per source, an aggregator that handles partial failures and deduplicates across sources, and a CLI that ships in a few hundred lines because Chapter 10's toolkit does the heavy lifting.
Chapter 10 built that toolkit: explore_api_structure, safe_get, try_fields, and extract_items_and_meta, all sitting in your api_helpers.py by the end of that chapter. This chapter is the case study where it all gets exercised at once.
Suppose you want to stay current with Python programming news. NewsAPI, The Guardian, and HackerNews each cover the same conceptual space (articles, authors, timestamps, links) from three different angles, and each one ships its data in its own shape with its own field names and its own failure modes. Querying all three means three separate integrations, three different data structures, and three different error patterns to manage.
This isn't unique to news aggregation. Your e-commerce platform needs both Stripe and PayPal. Your analytics dashboard combines Google Analytics with Mixpanel. Your authentication system supports GitHub, Google, and Microsoft. Production systems constantly face this challenge: multiple external services, each with its own response format, failure modes, and quirks. Your application still needs unified, consistent data.
What success looks like
By the end of the chapter you'll have a CLI that runs like this:
$ python news_aggregator.py
======================================================================
NEWS AGGREGATOR
======================================================================
Search news from NewsAPI, The Guardian, and HackerNews
Commands:
search <query> - search for news articles
again - re-display the last search results
group - show last results grouped by source
help - show this help message
quit - exit application
======================================================================
aggregator> search python programming
Searching for 'python programming'...
NewsAPI: 5 of 12847 matching articles
Guardian: 5 of 8453 matching articles
HackerNews: 5 of 583201 matching articles
======================================================================
RESULTS
======================================================================
Query: 'python programming'
Sources: NewsAPI, Guardian, HackerNews
Articles: 12 (3 duplicates removed)
1. Python 3.13 Performance Improvements
TechCrunch | January 15, 2025 at 02:30 PM
Sarah Chen
Python's latest release brings significant speed improvements...
https://techcrunch.com/2025/01/15/python-3-13-performance
[entries 2-10 omitted from this excerpt; the CLI renders all ten]
... and 2 more articles
======================================================================
One command queries three APIs. All sources respond. Fifteen articles became twelve after deduplication. Results display in one consistent format whether they came from NewsAPI, Guardian, or HackerNews. The complexity is hidden from the caller.
Why this is hard
The three services represent the same concept (a news article) in completely different ways. NewsAPI wraps articles in an articles array with nested source objects. Guardian buries them in response.results using field names like webTitle and webPublicationDate. HackerNews uses hits with Unix timestamps and minimal metadata. Same business concept, three incompatible structures.
Without a systematic approach, your codebase becomes a mess of conditionals. Display logic needs branching: "if NewsAPI show publishedAt, if Guardian show webPublicationDate, if HackerNews convert Unix timestamp." Every operation (storage, sorting, deduplication, filtering) requires source-specific handling. Adding a fourth source means updating dozens of locations throughout your code.
Production systems isolate API differences at the boundary. You normalize external formats into one internal representation once, then everything downstream works with consistent structure. Display code never knows or cares which API provided the data. That separation is what keeps systems maintainable as APIs evolve and requirements change.
What you'll learn
- Recognise when a project crosses the threshold from single-API to multi-API and the canonical-model pattern earns its complexity
- Map any API's response shape with Chapter 10's
explore_api_structurebefore writing extraction code - Design a canonical model that absorbs container, naming, and optionality differences across sources
- Write source-specific normalizers that import Chapter 10's
safe_get/try_fields/extract_items_and_metarather than redefining them - Build an aggregation pipeline that survives partial failures and deduplicates across sources
- Recognise where defensive programming stops being enough and validation must take over
What you'll build
models.py-- theArticledataclass with required and optional fields, plus post-init validationnormalize_newsapi.py-- turns NewsAPI'sarticlesarray intoArticleinstancesnormalize_guardian.py-- handles Guardian'sresponse.resultswrapper and thewebTitle/webPublicationDaterenamenormalize_hackernews.py-- handles HackerNews'shitsand the dual ISO/Unix timestamp pairaggregator.py-- theNewsAggregatorclass that fetches, normalizes, deduplicates, and sorts unified resultsdisplay.py-- the formatter that turnsArticleobjects into the unified CLI output, with no source-specific branchingnews_aggregator.py-- the CLI that turns the aggregator into a searchable command-line tool
Prerequisites
This chapter assumes you have api_helpers.py from Chapter 10 on your import path. Every Python block here that uses safe_get, try_fields, or extract_items_and_meta imports them by name -- the chapter does not redefine them. If you skipped Chapter 10, run through it first. The toolkit takes 30 minutes to build and saves you days here.
You'll also need python-dotenv installed so the examples can load credentials from your project .env file, following the Chapter 7 pattern. If it is not already installed in this project, run pip install python-dotenv.
You'll need free API keys for NewsAPI (500 requests/day) and the Guardian (5,000 requests/day). HackerNews requires no authentication. The next page provides registration URLs and setup instructions.