4. Download files safely
Downloads carry the same memory risk as uploads (response.content reads the whole body before you can touch it), plus a trust problem the upload side does not have. The server you are pulling from can advertise Content-Type: image/jpeg while serving an HTML error page. The safe download pattern combines stream=True (so you can inspect headers before committing to the body), header validation (so you abort cleanly on a content-type mismatch), and chunked write (so memory stays flat while the file lands on disk).
Your production download pattern
Save the following as download_file.py. It opens a streaming connection to the URL, validates the response headers before writing a single byte to disk, and copies the body in 8KB chunks straight into the output file:
import requests
class ContentTypeMismatch(Exception):
"""Raised when the server's Content-Type does not match what we expect."""
def download_file(url, output_path, expected_prefix="image/", chunk_size=8192):
"""
Download a file with streaming and content-type validation.
Aborts before writing to disk if the response's Content-Type does
not start with `expected_prefix` (default: "image/").
"""
print(f"Downloading from {url}...")
# stream=True fetches headers only; the body is held back until
# iter_content() so we can inspect headers and bail out cheaply.
with requests.get(url, stream=True, timeout=30) as response:
response.raise_for_status()
# Validate before committing to the body.
content_type = response.headers.get("Content-Type", "")
if not content_type.startswith(expected_prefix):
raise ContentTypeMismatch(
f"Expected {expected_prefix!r} but server returned "
f"Content-Type: {content_type!r}"
)
# Stream the body to disk in fixed-size chunks.
with open(output_path, "wb") as f:
for chunk in response.iter_content(chunk_size=chunk_size):
f.write(chunk)
print(f"Saved to {output_path}")
# Test it
download_file("https://httpbin.org/image/jpeg", "downloaded_image.jpg")
What stream=True actually does
With stream=True, requests only fetches the headers when get() returns. The TCP connection stays open and the body is held back until you call response.iter_content(). That window is where the production-grade work happens: you read Content-Length, Content-Type, and status_code, and decide whether to keep going. If the headers say the response is HTML, or the size is suspicious, or the status is not 200, you can drop the connection without spending bandwidth on a body you do not want.
The pattern is "check before you commit resources." It applies to anything where the body might be large, untrusted, or different from what the URL suggests. For one-shot downloads inside a script, you can probably get away with response.content; the moment user-supplied URLs or upstream servers enter the picture, stream=True plus iter_content() is the only safe shape.
Trust the header less than you think
The Content-Type header is set by the server, not derived from the actual bytes. A malicious or misconfigured server can send Content-Type: image/jpeg with an HTML error page in the body, and your code happily writes the HTML to downloaded_image.jpg. The header check above raises ContentTypeMismatch when the prefix does not match, which catches the obvious mismatches; production code typically goes one step further and inspects the first few bytes of the response body itself (the magic-byte trick from Section 2, applied to downloads). For the chapter's scope, the header check plus a reasonable expected_prefix is the right bar; for anything handling user-supplied URLs at scale, plan to add the magic-byte check too.