5. Process multiple files efficiently
Uploading one file is a single request. Uploading a hundred files is an architectural decision. The sequential approach (one upload at a time) is dead simple and bounded by total round-trip time. The concurrent approach (a small pool of worker threads) overlaps network waits and finishes in roughly total / max_workers time, at the cost of a slightly bigger blast radius if the upstream API rate-limits you. This section walks through the concurrent shape with ThreadPoolExecutor, the right default for upload-bound work, and names the production guards (per-file error capture, bounded worker count) that keep one bad file from taking down the batch.
Sequential vs concurrent
The two shapes split cleanly on what you optimise for:
- Sequential (one by one). Simple to code, but slow. If one fails, you catch the error and continue. Good for background scripts where total time is not critical.
- Concurrent (parallel). Much faster. Uses Python's
ThreadPoolExecutorto upload multiple files at once. This is the professional default for bulk operations against an API that allows it.
Concurrent batch upload
Save the following as concurrent_upload.py. It defines a single-file uploader, fans three uploads out across three worker threads, and collects each result regardless of whether it succeeded or threw:
import requests
from concurrent.futures import ThreadPoolExecutor
def upload_single_file(filename):
"""Upload one file. Returns a one-line status string."""
try:
with open(filename, 'rb') as f:
response = requests.post(
"https://httpbin.org/post",
files={'file': f},
timeout=30,
)
return f"OK {filename}: {response.status_code}"
except Exception as e:
return f"FAIL {filename}: {str(e)}"
files_to_upload = ['doc1.txt', 'doc2.txt', 'doc3.txt']
# Create dummy files for the test
for f in files_to_upload:
with open(f, 'w') as file:
file.write("content")
print("Starting concurrent upload...")
# Upload up to 3 files at once
with ThreadPoolExecutor(max_workers=3) as executor:
results = list(executor.map(upload_single_file, files_to_upload))
for result in results:
print(result)
Starting concurrent upload...
OK doc1.txt: 200
OK doc2.txt: 200
OK doc3.txt: 200
Production considerations
- Rate limits. Many APIs cap concurrent requests per client. Set
max_workersto a conservative value (3-5 is a sensible default; check the upstream API's docs before going higher). - Per-file error capture. Wrap each upload in
try/exceptand return a status string so one failure does not abort the batch. The wrapper above is the minimal version; production code logs the full traceback alongside the file path. - Progress tracking. For user-facing batches, persist each result row to a database (or a status file) so the user can resume after a crash and you can retry just the failures rather than re-running the whole job.
The concurrent pattern looks intimidating the first time and becomes second nature after the second. Once you have it, the performance gap on any upload-bound batch (anything with more than a handful of files) is large enough that the sequential version stops being worth writing. Section 6 brings this together with the chapter's keystone build: the Receipt Scanner pipeline, where a single image upload + OCR call + regex parse turns a phone-camera photo into structured data your code can use.