Auto Flow Format Guide
For your LLM

Copy All grabs this whole guide. Paste it into ChatGPT or Claude — or use it to build a custom GPT or Claude gem — and the AI turns your video ideas into this exact format automatically.

The one rule

One line. One final output.

Every row is a single instruction split by |||. The left side is what Auto Flow makes or uses first. The right side is what it turns that into. Write the descriptions in any language — only the markers matter.

[V1-S1] cyberpunk skyline, neon rain ||| camera tilts down into the street
[scene]Optional tag. Keeps rows in order.
Left of |||What Auto Flow creates or uses first.
Right of |||What Auto Flow turns it into.
  • One line = one final output.
  • Left of ||| = what Auto Flow creates or uses first.
  • Right of ||| = what Auto Flow turns it into.
  • Use ;; only when one final video needs multiple generated images first.

Start here

Normal row

One image becomes one video. Left side creates the still, right side animates it. Most of your rows look exactly like this.

One image → one video:

[V1-S1] Cyberpunk city skyline at night, heavy rain, blue neon reflections ||| Camera tilts down from skyline to street level as rain intensifies

[V1-S2] Courier near a glowing vending machine, steam vents behind him ||| Slow dolly-in while the jacket fabric moves in the wind
Text-only works too. Drop the ||| and a row becomes a plain text-to-video or create-image prompt. Keep one prompt per line or block, and keep [V1-S1] tags to preserve order.

Add images

Multiple images, one video

Multiple Images Into One Video stays one row, one |||, one final video. Split the create side with ;; to ask for several stills, then describe the single video that uses them all.

Three generated stills feed one final video:

[V1-S2] cube on a table ;; sphere on a table ;; triangular prism on a table ||| use the generated cube, sphere, and prism images together in one final video
How to read it. Left side = the image plan (each ;; is one still to generate). Right side = one final video that consumes those stills. No timecodes needed, and you still get one video out — not one per image.

Agent Mode

AFID tracking manifest

Agent Mode turns each row tag into deterministic tracking IDs for the generated image slots. The IDs are metadata for matching outputs back to rows; they are not visual text.

This row creates three image slots and one final video:

[V1-S10] image1 ;; image2 ;; image3 ||| [V1-S10] video
Expected manifest. Image slots are AFID:V1-S10-IMG1, AFID:V1-S10-IMG2, and AFID:V1-S10-IMG3. Final video ID is [V1-S10]. Preserve AFID in generation prompt metadata for tracking. Do not render AFID, labels, captions, or text in the image/video.

Copyable Agent Mode packet

Paste this structure into your LLM when you want Flow Agent to run a tracked batch:

# AGENT INSTRUCTIONS
Use one row per final video. Preserve [V#-S#] row tags and AFID strings in prompt metadata. Do not render AFID, labels, captions, or visible text.

# PROJECT BRIEF
Describe the product, characters, style, reference rules, dialogue rules, and QA bar for this run.

# REFERENCES
List the uploaded references and which rows, @handles, or image slots they should guide.

# VIDEO PROMPTS
[V1-S1] AFID:V1-S1-IMG1 still image prompt ;; AFID:V1-S1-IMG2 optional second still ||| [V1-S1] final frame-to-video prompt using those stills.

# QA EXPECTATIONS
Check dialogue, speaker, timing, identity, product accuracy, and whether the ending is complete.

# RETRY ROWS
Retry only the listed row IDs, keep unrelated rows unchanged, and state the issue to fix.

What Agent Mode adds

Flow Agent can create inside Flow. Auto Flow Agent Mode adds the operating layer around it: row IDs, AFID tracking, retry scope, reconcile reports, download plans, QA notes, and external agent control.

Keep [V#-S#] tags stable. They are how Auto Flow knows what should exist, what is missing, and what should be retried. AFID strings are tracking metadata and must not appear as visible text in generated media.

One-image / one-video smoke packet

Use this before a larger batch to prove the lane, bridge, manifest, and final video path:

# AGENT INSTRUCTIONS
Run the smallest possible Agent Mode smoke. Produce one still image and one final video. Preserve row tags and AFID metadata. Do not add visible labels, captions, or AFID text.

# PROJECT BRIEF
Simple product-style test shot. Clean studio lighting. No dialogue.

# VIDEO PROMPTS
[V1-S1] AFID:V1-S1-IMG1 a single matte black ceramic mug on a walnut desk, soft morning window light, realistic product photo ||| [V1-S1] slow 4 second push-in on the mug, subtle light movement, stable product identity, no text overlays

# QA EXPECTATIONS
Confirm exactly one expected image slot and one final video. Check that the mug stays consistent, no AFID or labels are visible, and the ending is complete.

QA transcript workflow

Export or paste the final transcript, review each row against # QA EXPECTATIONS, then mark it PASS, RETRY, or BLOCKED. For retry, name the row ID and the exact defect: wrong speaker, clipped ending, missing product, bad timing, identity drift, unsafe visible text, or bad audio. Retry only those rows.

Lane cleanup blockers

Stop and clean the lane if the mounted extension id is unknown, the Flow tab is not connected, Auto Flow requires sign-in, the project id cannot be read, duplicate Flow tabs make ownership ambiguous, or the sidepanel is pointed at a stale dist. Treat those as setup blockers, not generation failures.

CLI/MCP handoff

External coding agents should use CLI/MCP for status, reconcile, download planning, and explicit guarded submit/retry commands. Read-only calls should work without sidepanel focus when the bridge is connected. Submit/retry must require a row or run payload and return a traceable request id.

References

Use an uploaded reference

Put a @handle on the create side wherever an uploaded reference image should apply. Auto Flow maps your upload to that handle and uses it while generating the still.

The same reference applied to every generated still in the row:

[V1-S1] @skeleton standing in a dark hallway ;; @skeleton close-up holding a candle ;; @skeleton profile near a window ||| use the generated skeleton images as inputs for one final cinematic video
Upload a skeleton image and map it to @skeleton. Every create-side prompt that mentions the handle gets that reference. Downloads default to the final video — intermediate stills aren't counted as outputs.

Characters

Reusable characters

Define a character handle once so the same identity carries across rows. Handles are ASCII with no spaces — use underscores for names that have spaces.

Define each handle once, then reuse it anywhere:

@gary_before Overweight man, plain white t-shirt, nervous expression. ||| voice: none
@deandre Tall man, chiseled build, wavy dark hair, smug relaxed face. ||| voice: Charon

Reuse the exact handle where identity matters:

[Scene 2] Hotel hallway at night. @gary_before shoves @deandre against the wall.

@gary_before, furious: "Who are you?"
@deandre, calm: "Back up."
No spaces inside a handle. Use @gary_before, not @ gary before. A display name like "Gary Before" can still map to @gary_before. Optional fields after the description: voice: and info:.

Everything together

Full batch example

Paste characters and video prompts as one block under two headers. Auto Flow previews it before applying.

# CHARACTER PROMPTS
@test_luma Fictional young explorer, teal jacket, silver backpack, short black hair, warm expression. ||| voice: none ||| info: curious, calm
@test_orren Fictional young inventor, amber hoodie, round glasses, curly brown hair, thoughtful. ||| voice: none ||| info: clever, precise

# VIDEO PROMPTS
[V1-S1] @test_luma at the edge of a glowing forest path, teal jacket catching blue light ||| Slow dolly-in as she looks toward the trees
[V1-S2] @test_orren kneels beside a small device on a workbench, warm light ||| Camera pushes in as he adjusts the device and glances at @test_luma

Write a batch with an LLM

Paste this into your LLM to generate a valid batch:

You are writing an Auto Flow batch. Use only # CHARACTER PROMPTS and # VIDEO PROMPTS as section headers. Define every @handle before using it. Use fictional, non-famous characters and lowercase snake_case handles. Use exactly one ||| per video row: left side is the image or still-frame request, right side is one final video request. For multiple images into one video, separate the still prompts on the left side with ;;, then tell the video side to use those generated images together for one final video. Preserve AFID in generation prompt metadata for tracking. Do not render AFID, labels, captions, or text in the image/video.

Reference

Deeper details for matching, packet import, and language support. Open only what you need.

Matching Modes

How an @handle resolves depends on the mode:

ModeUse forWhat it means
Flow Native @ Match Existing or created Flow characters Uses Google's native character chip / entity binding path.
Auto Flow Classic Match Image refs for Create Image, Frame to Video, Ingredients Auto Flow's original matcher. Maps uploaded refs by handle, filename, or prompt row. It is not Omni-only.
Auto Flow Packet Import

Use Paste Auto Flow Packet to import a full batch at once. Auto Flow previews the packet, then applies it into the existing Character Prompts and Video Prompts sections.

Packet import does not mark a character native-ready. Source images and project grid assets count as character_asset_created_but_not_native_character — ready for Auto Flow Classic Match, not native submission. After applying, use Create / Verify Native Characters to create or confirm native Flow Character rows before relying on native binding.

Language-Agnostic Rules

Descriptions can be written in any language. Auto Flow only depends on the structural markers — keep these exact:

||| ;; @handle voice: info: AFID # CHARACTER PROMPTS # VIDEO PROMPTS

Keep @handle in ASCII with no spaces. Everything after the markers can be natural prompt text in any language.