SLATE · AI Video Production Studio

01 · WHY MIGRATE

V2 follows a script. It can't reason.

StudioF V2 was a procedural pipeline — a fixed sequence of function calls with hardcoded providers and no reasoning. Every new content type, provider, or edge case meant editing the orchestration code itself. The cost of iteration was the cost of the whole production.

12+ min

render time

for 90s of video

0

parallelism

sequential pipeline

·

retries

restart from scene 1

17K

lines of code

orchestrator.py · single file

02 · THE SYSTEM

Twenty-six agents, seven guilds, one Showrunner.

A single Showrunner — gemini-2.5-pro — reasons over the brief and dynamically dispatches 26+ specialized ADK agents across seven guilds via the AgentTool pattern, reading and writing a shared session-state contract. No fixed script: the Showrunner chooses research depth, visual sources, voices, and providers per production.

EDITORIAL · 5 ✓

Editorial

showrunner
deep_researcher
visual_scout
script_writer
dataviz_agent

AUDIO · 3 ✓

Audio

tts_agent
elevenlabs_agent
music_agent

IMAGE · 6 ✓

Image

imagen_agent
prompt_engineer_agent
diagram_agent
stock_agent
web_gallery_agent
kinetic_type_agent

VIDEO · 3 ✓

Video

veo_agent
video_composer_agent
youtube_clip_agent

3D · 1 ✓

3D

infographic_agent
blender_scene_agent
geo_agent

POST-PROD · 2 ✓

Post-Prod

assembly_agent
colorist_agent
captions_agent

PUBLISH · 6 ✓

Publish

youtube_agent
instagram_agent
tiktok_agent
twitter_agent
linkedin_agent
facebook_agent

"A slate is a contract: who, what, when, take, scene.
Our agents are slates."

03 · ARCHITECTURE

TTS first. Then parallel.

Audio drives durations, not the other way around. Narration is synthesized first — every visual agent then targets the exact scene duration, instead of the brittle "generate video, then trim audio" approach. Ten parallel visual-generation branches fan out once durations lock, reconciled by a quality gate before assembly.

PORTE 1 · MANIFEST

PORTE 2 · FREE-TEXT BRIEF

↓

showrunner

↓

deep_researcher

visual_scout

script_writer

dataviz_agent

↓

tts_agent

→

🔒 EXACT DURATIONS LOCKED

∥ ∥ ∥

AUDIO POST

elevenlabs_agent · music_agent — sfx + score, parallel

VISUAL FAN-OUT

10 branches — Veo, Imagen, diagrams, dataviz, kinetic type, 3D, geo, stock, galleries, YouTube b-roll

STYLE

colorist_agent (LUT) + quality_gate

↓

assembly_agent · FFmpeg (Cloud Run) → final.mp4 → GCS

04 · FINDINGS & LEARNINGS

From one script to 26 reasoning agents.

Five hard-won lessons from rebuilding StudioF V2 — a 17,000-line procedural pipeline — as a 26-agent reasoning system on Vertex AI Agent Engine.

26+

ADK agents

across 7 guilds

10

parallel branches

visual generation, fan-out + back

4

Vertex AI regions

quota distribution + retry/backoff

0

custom cost code

OTel → Cloud Trace → Billing

01

AgentTool scales further than expected.

A single gemini-2.5-pro Showrunner reliably dispatches 26+ heterogeneous agents — different models, tool types, MCP servers, even long-running async Veo jobs — as long as the shared session-state contract between them is explicit and documented.

02

TTS-first sync model.

Generating narration audio before the visual fan-out lets every visual agent target the exact scene duration — instead of the brittle "generate video, then trim audio" approach most pipelines use.

03

Session state is per-tool-call, not incremental.

VertexAiSessionService.append_event() only persists tool_context.state deltas when a tool function returns. A 10-15 minute, 8-phase orchestration tool that's interrupted mid-flight loses state from already-completed phases — even though our Firestore event log shows them done. A checkpoint/resume mechanism is on the roadmap.

04

Parallelism needs quota planning.

Running 10 parallel visual-generation branches against Vertex AI required distributing calls across 4 regions with consistent retry/backoff — without it, 429 ResourceExhausted errors dominated.

05 · HOW IT WORKS tap any step to expand

From a single prompt to a finished cut.

01 Submit a manifest, or a partial request.

Two doors. Porte 1 takes a complete ManifestConfig — topic, language, ratio, platform, duration, tone, research mode. Porte 2 takes an input video and a list of tasks: subtitles, color grade, translate, geo overlay.

{
  "topic": "La chute de Boeing",
  "language": "fr-FR",
  "aspect_ratio": "16:9",
  "platform": "YouTube",
  "target_duration_sec": 90,
  "tone": "Journalistique"
}

02 The Showrunner reasons, then dispatches across 7 guilds.

A single gemini-2.5-pro Showrunner reasons over the brief — research depth, visual sources, voice, providers — then dispatches AgentTool calls to 26+ specialized agents across 7 guilds, each reading its slice of one shared session-state contract.

showrunner →
├─ Editorial → research + script (5 agents)
├─ Audio → TTS + SFX + music (3 agents)
├─ Image/Video → 9 visual agents
├─ 3D → infographics
├─ Post-Prod → color + assembly (2 agents)
└─ Publish → 6 social platforms

03 TTS first. Exact scene durations lock before visuals.

The narration is synthesized first, before any visual is generated. Per-scene durations are extracted from the audio itself. Now every downstream agent knows the exact duration of its plate. No drift.

This is the sync lock that unblocks parallelism. Without it, ten branches couldn't run in parallel without timing collisions.

04 Ten branches fan out. A quality gate reconciles them.

Once durations lock, ten visual-generation branches run in parallel — Veo, Imagen, diagrams, dataviz, kinetic type, 3D infographics, geo-maps, stock, web galleries, YouTube b-roll — each scene routed to the provider that fits it best.

A quality gate checks asset coverage before assembly_agent runs a multi-threaded FFmpeg render on Cloud Run — color graded, assembled, and uploaded to Cloud Storage as a single MP4.

06 · ACCOMPLISHMENTS click a card to expand

From zero to a live agent platform.

01 · AGENTS & GUILDS

Zero to twenty-six, in weeks.

26+

Twenty-six production ADK agents across seven guilds — including MCP-based ElevenLabs and Splice integrations for audio — fully orchestrated by one Showrunner and deployed on Vertex AI Agent Engine.

guilds

7

agents

26+

mcp integrations

2

build time

~3 weeks

02 · A2A ENDPOINT

A live agent, not a demo.

LIVE

A public Cloud Run endpoint exposes Slate as a JSON-RPC 2.0 agent with a published agent card — reachable and composable by other agentic systems via tasks/send and tasks/sendSubscribe.

protocol

JSON-RPC 2.0

methods

2

agent card

✓

host

Cloud Run

03 · VISUAL FAN-OUT

Ten branches out, one cut back.

10×

Veo 3, Imagen 3, hand-drawn diagrams, generative dataviz, kinetic type, 3D infographics, geo-maps, stock footage, web galleries and YouTube b-roll all run in parallel — reconciled by a quality gate before a single color-graded MP4 is assembled.

branches

10

models

Veo3 / Imagen3 / Lyria

quality gate

✓

output

1 MP4

04 · OBSERVABILITY

Full tracing, zero extra code.

0 LOC

Every agent call, token count and pipeline phase emits OTel GenAI semantic-convention spans into Cloud Trace, plus a real-time Firestore event feed for live progress — with zero custom cost-tracking code.

spans

OTel GenAI

trace

Cloud Trace

events

Firestore

cost code

0

07 · WHERE WE STAND switch tab to compare

Picked a side? Pick all of them.

Capability	Runway	SLATE
End-to-end journalistic production	✗ frame-by-frame tool	✓ full pipeline
Grounded research + fact check	—	✓ deep_researcher (Google Search grounding)
Multi-agent orchestration	—	✓ 26+ agents · 7 guilds
Self-hosted on your GCP	SaaS only	✓ Agent Engine deploy
Per-token cost visibility	opaque billing	✓ OTel + Cloud Billing

Capability	Captions	SLATE
Vertical reels from text	✓ great	✓ Porte 1 · vertical preset
Long-form 90s+ journalistic	✗ short-form only	✓ up to 180s+
Sourced b-roll, archive footage	stock templates	✓ NewsClipper + Archive
Strangler-fig migration path	SaaS lock-in	✓ wraps V2 monolith
Open architecture · MCP	—	✓ MCP (ElevenLabs, Splice + 4 planned)

Capability	Descript	SLATE
Editor-first workflow	✓ excellent	producer-first
Autonomous production from prompt	—	✓ Porte 1
Parallel agent execution	linear timeline	✓ 3 pools · 15w
Custom voices + Lyria score	limited library	✓ Chirp 3 + Lyria
Pluggable models per agent	opaque	✓ manifest.models

Capability	In-house monolith	SLATE
Time to ship a new feature	weeks · cross-cutting	✓ days · new agent
Parallelism	0	✓ 10 parallel branches
Retries	restart from scene 1	✓ per-agent retry
Observability	artisanal cost tracker	✓ Cloud Trace OTel
Testability	end-to-end only	✓ adk eval per agent

08 · FAQ click to expand

Questions, before access?

How is this different from Runway or Veo standalone?

Runway and Veo are generation models. SLATE is an end-to-end production pipeline — research, fact-check, script, narrate, source b-roll, generate, grade, render, package. We use Veo (and Imagen, Lyria, Chirp) inside, but the orchestration, the typed contracts, and the parallelism are what take it from a single brief to a published video, unattended.

Can I self-host SLATE on my own GCP project?

Yes. SLATE deploys to your Vertex AI Agent Engine. You bring the GCP project, we bring the agents. Multi-region quota rotation is baked in (four regions by default). Two MCP integrations ship today (ElevenLabs, Splice); four more — FFmpeg render, Remotion overlay, geo-render, web capture — are on the roadmap to run on your Cloud Run.

What is the "Strangler Fig" migration about?

We're replacing a 17,000-line procedural pipeline (StudioF V2) — fixed function calls, hardcoded providers, no reasoning — with 26 agents that reason about the brief, without touching the original code. V2 keeps running in production. A shadow-mode comparator that runs both systems on the same reference manifests is on the roadmap before V2 is retired.

Is SLATE open source?

The contracts and the graph definitions are open (Apache 2.0). The Showrunner and the visual scout heuristics are source-available for early-access partners under a commercial license. We're building the project in public on github.com/uraaura0o/slate.

What languages do you support today?

fr-FR and en-US are production-grade. es-ES, ar-MA, de-DE, pt-BR are in beta — voice quality matches, but our editorial style guide for those locales is still being written. Submit a request and we'll prioritise your locale.

How are costs tracked?

Every LLM call emits OTel spans with gen_ai.usage.input_tokens and gen_ai.usage.output_tokens attributes. They roll up into Cloud Trace and Cloud Billing natively. No custom cost tracker — you get the same dashboards Google ships.

What's next for Slate?

Four directions: Porte 2 à la carte — narrow requests ("redo the music for scene 4", "give me 3 alternate thumbnails") routed to only the relevant agents, no full production required. A Trajectory Optimizer agent that solves provider/agent assignment as a constrained optimization problem instead of per-scene LLM heuristics. Promoting our FFmpeg, Remotion, geo-render and web-capture services to first-class internal MCP servers. And filling the remaining guild agents (voice cloning, dubbing, captions, upscaling) plus a checkpoint/resume mechanism for long-running productions.

09 · PRICING private beta · early access

Pay for what rolls.

Three tiers. All include the full agent stack, multi-region quota rotation, Cloud Trace, and Model Armor. Differences are seat count, support, and per-production allowance.

Starter

$0 · hackathon

For solo creators and contributors. Hosted SLATE on shared infra, fair-use quota.

5 productions / month
Porte 1 + Porte 2 access
Community Discord
Public manifest library

START FREE

Studio

TBD · business model under study

For newsrooms and production studios shipping daily. Dedicated capacity on your GCP.

200 productions / month
Self-hosted on your GCP
Priority roadmap input
Email + Slack support · 24h
Custom LUTs + voices
Glossary + brand book editor

REQUEST ACCESS →

Enterprise

Custom

Broadcasters, agencies, regulated industries. Air-gapped deploy, on-call, custom guilds.

Unlimited productions
Air-gapped + SSO · SAML
Dedicated solutions architect
SOC2 + GDPR · DPA on file
Custom agent guilds
SLA · 99.95% uptime

TALK TO SALES

05 · BUILT ON

Standing on good shoulders.

SLATE is opinionated about almost nothing infrastructure. We use Google's stack end-to-end and Anthropic's MCP as the tool protocol.

Google ADK

GEAP 2026

Vertex AI

Gemini 2.5

Imagen 3.1

Veo 3

Chirp 3 HD

Lyria

MCP

Cloud Run

Cloud Trace · OTel

Twenty-six agents.
Seven guilds.
One slate.

V2 follows a script. It can't reason.

Twenty-six agents, seven guilds, one Showrunner.

Editorial

Audio

Image

Video

3D

Post-Prod

Publish

TTS first. Then parallel.

From one script to 26 reasoning agents.

AgentTool scales further than expected.

TTS-first sync model.

Session state is per-tool-call, not incremental.

Parallelism needs quota planning.

From a single prompt to a finished cut.

From zero to a live agent platform.

Picked a side? Pick all of them.

Questions, before access?

Pay for what rolls.

Standing on good shoulders.

One slate.

Twenty-six agents. Seven guilds. One slate.

V2 follows a script. It can't reason.

Twenty-six agents, seven guilds, one Showrunner.

Editorial

Audio

Image

Video

3D

Post-Prod

Publish

TTS first. Then parallel.

From one script to 26 reasoning agents.

AgentTool scales further than expected.

TTS-first sync model.

Session state is per-tool-call, not incremental.

Parallelism needs quota planning.

From a single prompt to a finished cut.

From zero to a live agent platform.

Picked a side? Pick all of them.

Questions, before access?

Pay for what rolls.

Standing on good shoulders.

One slate.

Twenty-six agents.
Seven guilds.
One slate.