Understanding Web Images for AI Agents

Images are a fundamental part of the modern web. When AI agents scrape pages, they often need to capture visual content — charts, product photos, diagrams — alongside the text. This article covers the key considerations for image-aware scraping.

A hero image showing a web spider diagram

The first consideration is format. JPEG is the dominant format for photographs on the web, while PNG is preferred for screenshots and diagrams that require lossless compression. WebP is increasingly common as a modern alternative offering superior compression ratios.

Architecture diagram of the web spider pipeline

For AI agents that feed images to vision-language models, the wire format that all major LLM APIs accept is a base64-encoded data URL: data:image/jpeg;base64,…. This means an agent can scrape an image, encode it in-memory, and pass it directly to GPT-4o or Claude without any intermediate file I/O.

Performance chart comparing image formats by file size

Caching is the other key concern. Fetching the same image repeatedly wastes bandwidth and risks rate limiting. A sensible strategy is to cache small images (under 32 KB) inline as base64 in the page cache, and write larger images as binary files to a sibling images/ directory, storing only the file path in the JSON index.

Relative URL image — should be resolved to absolute Inline data URL image — 1x1 pixel PNG

Conclusion

Web image scraping for AI agents is straightforward when broken into three layers: fetch (extend the HTTP client to return binary data), normalise (base64 + MIME type), and persist (inline for small, file-backed for large). The resulting ImageRef objects are immediately usable by any vision-capable LLM.