SLATE turns a single brief — a JSON manifest or a free-text prompt — into a finished, published video. One Showrunner (Gemini 2.5 Pro) reasons about the brief and dynamically dispatches 26+ specialized ADK agents across 7 guilds — research, script, voice, music, visuals, color and publish — unattended.
StudioF V2 was a procedural pipeline — a fixed sequence of function calls with hardcoded providers and no reasoning. Every new content type, provider, or edge case meant editing the orchestration code itself. The cost of iteration was the cost of the whole production.
A single Showrunner — gemini-2.5-pro — reasons over the brief and dynamically dispatches 26+ specialized ADK agents across seven guilds via the AgentTool pattern, reading and writing a shared session-state contract. No fixed script: the Showrunner chooses research depth, visual sources, voices, and providers per production.
Audio drives durations, not the other way around. Narration is synthesized first — every visual agent then targets the exact scene duration, instead of the brittle "generate video, then trim audio" approach. Ten parallel visual-generation branches fan out once durations lock, reconciled by a quality gate before assembly.
Five hard-won lessons from rebuilding StudioF V2 — a 17,000-line procedural pipeline — as a 26-agent reasoning system on Vertex AI Agent Engine.
A single gemini-2.5-pro Showrunner reliably dispatches 26+ heterogeneous agents — different models, tool types, MCP servers, even long-running async Veo jobs — as long as the shared session-state contract between them is explicit and documented.
Generating narration audio before the visual fan-out lets every visual agent target the exact scene duration — instead of the brittle "generate video, then trim audio" approach most pipelines use.
VertexAiSessionService.append_event() only persists tool_context.state deltas when a tool function returns. A 10-15 minute, 8-phase orchestration tool that's interrupted mid-flight loses state from already-completed phases — even though our Firestore event log shows them done. A checkpoint/resume mechanism is on the roadmap.
Running 10 parallel visual-generation branches against Vertex AI required distributing calls across 4 regions with consistent retry/backoff — without it, 429 ResourceExhausted errors dominated.
Two doors. Porte 1 takes a complete ManifestConfig — topic, language, ratio, platform, duration, tone, research mode. Porte 2 takes an input video and a list of tasks: subtitles, color grade, translate, geo overlay.
{
"topic": "La chute de Boeing",
"language": "fr-FR",
"aspect_ratio": "16:9",
"platform": "YouTube",
"target_duration_sec": 90,
"tone": "Journalistique"
}
A single gemini-2.5-pro Showrunner reasons over the brief — research depth, visual sources, voice, providers — then dispatches AgentTool calls to 26+ specialized agents across 7 guilds, each reading its slice of one shared session-state contract.
showrunner →
├─ Editorial → research + script (5 agents)
├─ Audio → TTS + SFX + music (3 agents)
├─ Image/Video → 9 visual agents
├─ 3D → infographics
├─ Post-Prod → color + assembly (2 agents)
└─ Publish → 6 social platforms
The narration is synthesized first, before any visual is generated. Per-scene durations are extracted from the audio itself. Now every downstream agent knows the exact duration of its plate. No drift.
This is the sync lock that unblocks parallelism. Without it, ten branches couldn't run in parallel without timing collisions.
Once durations lock, ten visual-generation branches run in parallel — Veo, Imagen, diagrams, dataviz, kinetic type, 3D infographics, geo-maps, stock, web galleries, YouTube b-roll — each scene routed to the provider that fits it best.
A quality gate checks asset coverage before assembly_agent runs a multi-threaded FFmpeg render on Cloud Run — color graded, assembled, and uploaded to Cloud Storage as a single MP4.
| Capability | Runway | SLATE |
|---|---|---|
| End-to-end journalistic production | ✗ frame-by-frame tool | ✓ full pipeline |
| Grounded research + fact check | — | ✓ deep_researcher (Google Search grounding) |
| Multi-agent orchestration | — | ✓ 26+ agents · 7 guilds |
| Self-hosted on your GCP | SaaS only | ✓ Agent Engine deploy |
| Per-token cost visibility | opaque billing | ✓ OTel + Cloud Billing |
| Capability | Captions | SLATE |
|---|---|---|
| Vertical reels from text | ✓ great | ✓ Porte 1 · vertical preset |
| Long-form 90s+ journalistic | ✗ short-form only | ✓ up to 180s+ |
| Sourced b-roll, archive footage | stock templates | ✓ NewsClipper + Archive |
| Strangler-fig migration path | SaaS lock-in | ✓ wraps V2 monolith |
| Open architecture · MCP | — | ✓ MCP (ElevenLabs, Splice + 4 planned) |
| Capability | Descript | SLATE |
|---|---|---|
| Editor-first workflow | ✓ excellent | producer-first |
| Autonomous production from prompt | — | ✓ Porte 1 |
| Parallel agent execution | linear timeline | ✓ 3 pools · 15w |
| Custom voices + Lyria score | limited library | ✓ Chirp 3 + Lyria |
| Pluggable models per agent | opaque | ✓ manifest.models |
| Capability | In-house monolith | SLATE |
|---|---|---|
| Time to ship a new feature | weeks · cross-cutting | ✓ days · new agent |
| Parallelism | 0 | ✓ 10 parallel branches |
| Retries | restart from scene 1 | ✓ per-agent retry |
| Observability | artisanal cost tracker | ✓ Cloud Trace OTel |
| Testability | end-to-end only | ✓ adk eval per agent |
Three tiers. All include the full agent stack, multi-region quota rotation, Cloud Trace, and Model Armor. Differences are seat count, support, and per-production allowance.
SLATE is opinionated about almost nothing infrastructure. We use Google's stack end-to-end and Anthropic's MCP as the tool protocol.
We're inviting a small group of newsrooms + production studios to the private beta. If you make video at scale and want to migrate off a monolith, talk to us.