6. Build your receipt scanner pipeline
Time to compose the patterns from the last five sections into one runnable system. The Receipt Scanner takes a receipt image off disk, posts it to the OCR.space API as a multipart upload, parses the returned plain-text into structured fields (total amount, date), and returns a Python dict your code can use. It is the same shape that real expense-tracking apps run in production. The only things that change at scale are the OCR vendor (Mindee, Veryfi, AWS Textract, Google Document AI) and how much regex sophistication the parser carries.
The pipeline has four stages, each pulling on a pattern from earlier in the chapter:
- Upload the receipt image as
multipart/form-datawithfiles=+data=for the API key (Section 2). - Extract the raw text from the OCR.space JSON response, handling the API's error envelope explicitly.
- Parse the text with regex to pull out the total amount and the date.
- Return a structured dict, with a fallback note if neither field is found.
You will use the free OCR.space API for this project. By the end you have a working document-processing system you can run against any receipt photo on your phone.
helloworld demo key is shared and rate-limited
The class below falls back to "helloworld" when no API key is supplied. That is the OCR.space demo key, and it works for casual experimentation, but it is shared across every reader running this chapter and every developer trying the API for the first time. Expect occasional "Auth failure" or "request throttled" responses on first run. For reliable runs, sign up for a free OCR.space API key (one form, free tier around 25,000 requests per month at time of writing) and set OCRSPACE_API_KEY in your .env file. The class picks it up automatically via os.environ.get.
The complete pipeline
The class below is roughly 90 lines, but it maps cleanly to the four stages above. __init__ resolves the API key (constructor arg, then OCRSPACE_API_KEY env var, then the "helloworld" fallback). scan_receipt handles stages 1-3: file existence check, multipart upload via the files= + data= pattern from Section 2, then careful traversal of the OCR.space response envelope (the API can fail on the HTTP layer or report success at HTTP-200 while flagging an error in the JSON body, so both branches need handling). _parse_data handles stage 4: two regexes for total amount and date, plus a fallback note if neither matches. Save the listing as receipt_scanner.py:
import requests
import re
import os
class ReceiptScanner:
def __init__(self, api_key=None):
"""
api_key:
Your OCR.space API key. When omitted, this falls back to the
OCRSPACE_API_KEY environment variable or 'helloworld' for the demo key.
"""
if api_key is None:
api_key = os.environ.get("OCRSPACE_API_KEY", "helloworld")
self.api_key = api_key
self.url = "https://api.ocr.space/parse/image"
def scan_receipt(self, image_path):
print(f"Scanning {image_path}...")
# 1. Validate File
if not os.path.exists(image_path):
return {"error": "File not found"}
# 2. Upload and Process (Synchronous)
try:
with open(image_path, "rb") as f:
payload = {
"apikey": self.api_key,
"language": "eng",
"isOverlayRequired": False,
}
files = {"file": f}
response = requests.post(
self.url,
files=files,
data=payload,
timeout=60, # OCR takes time
)
response.raise_for_status()
result = response.json()
# Check for API-level errors
if result.get("IsErroredOnProcessing"):
error_message = result.get("ErrorMessage")
if isinstance(error_message, list):
error_message = "; ".join(error_message)
return {"error": error_message or "Unknown OCR error"}
# 3. Extract Text
parsed_results = result.get("ParsedResults", [])
if not parsed_results:
return {"error": "No text found"}
raw_text = parsed_results[0].get("ParsedText", "")
return self._parse_data(raw_text)
except Exception as e:
return {"error": str(e)}
def _parse_data(self, text):
"""Extract amount and date using regex."""
print("Analyzing text...")
# Regex for currency (for example, $12.99 or 12.99)
# Looks for lines starting with Total or Amount
amount_match = re.search(
r"(Total|Amount)[:\s]*\$?([\d,]+\.\d{2})",
text,
flags=re.IGNORECASE,
)
# Regex for date in common formats (dd/mm/yyyy, mm/dd/yyyy, yyyy-mm-dd)
date_match = re.search(
r"(\d{1,2}[\/\-]\d{1,2}[\/\-]\d{2,4}|\d{4}[\/\-]\d{1,2}[\/\-]\d{1,2})",
text,
)
data = {}
if amount_match:
data["total_amount"] = amount_match.group(2)
if date_match:
data["date"] = date_match.group(1)
if not data:
data["note"] = "No amount or date found. Check OCR quality."
return data
if __name__ == "__main__":
scanner = ReceiptScanner() # Uses env var or demo key by default
result = scanner.scan_receipt("sample_receipt.jpg")
print(result)
You are not just uploading a file at this point. You are pushing it through an external document-processing API, interpreting the JSON envelope, and distilling messy OCR text into the two fields your application actually needs. Vendor swaps and richer parsers are mostly drop-in changes; the four-stage shape (upload, extract, parse, return) carries.
A real receipt regex is its own discipline
The parser in _parse_data works on receipts that include a clean Total: $X.XX line and an ISO-shaped or slash-separated date. Real receipts are messier: handwritten, partial OCR results, multiple totals (subtotal, tax, total), date formats that vary by country, prices in multiple currencies, line items with their own dollar signs. Production receipt parsing usually leans on a specialist API (Mindee, Veryfi, Klippa) that does the layout-aware extraction for you. The regex pattern here is enough for the chapter's project; it is not enough for a billing system.
Synchronous now, asynchronous later
OCR.space in this example is synchronous: requests.post() holds the connection open until the server finishes processing. That is fine for snapshot-sized images and a 60-second timeout. For heavier workloads -- video transcription, bulk document scanning, anything that takes minutes -- the pattern flips to asynchronous: the upload returns a job_id immediately, your code polls a status endpoint every few seconds, and you retrieve the result when status flips to "Complete." This avoids connection timeouts and lets your app handle other work while the server processes the document. Chapter 22 covers webhooks, which let the server push that completion notification to you instead of you polling.
What you have just built
The Receipt Scanner is the load-bearing demo of the chapter, but the patterns carry directly into production work you would actually be paid for: profile-image uploaders in social apps, PDF report generators in finance tools, document scanners in expense apps. Section 7 recaps the chapter and pressure-tests the architectural calls before Chapter 22 flips the direction from outbound uploads to inbound webhook events.