Understanding Web Images for AI Agents
Images are a fundamental part of the modern web. When AI agents scrape pages, they often need to capture visual content — charts, product photos, diagrams — alongside the text. This article covers the key considerations for image-aware scraping.
The first consideration is format. JPEG is the dominant format for photographs on the web, while PNG is preferred for screenshots and diagrams that require lossless compression. WebP is increasingly common as a modern alternative offering superior compression ratios.
For AI agents that feed images to vision-language models, the wire format that all
major LLM APIs accept is a base64-encoded data URL:
data:image/jpeg;base64,…. This means an agent can scrape an image,
encode it in-memory, and pass it directly to GPT-4o or Claude without any
intermediate file I/O.
Caching is the other key concern. Fetching the same image repeatedly wastes
bandwidth and risks rate limiting. A sensible strategy is to cache small images
(under 32 KB) inline as base64 in the page cache, and write larger images as binary
files to a sibling images/ directory, storing only the file path in
the JSON index.
Conclusion
Web image scraping for AI agents is straightforward when broken into three layers:
fetch (extend the HTTP client to return binary data), normalise (base64 + MIME type),
and persist (inline for small, file-backed for large). The resulting
ImageRef objects are immediately usable by any vision-capable LLM.