R&D Engineer โ€ข Dependabot for AI
01

Dependabot for AI Models

Every week a new model drops. Your team manually benchmarks it. What if that happened automatically โ€” with PRs when something's better?

52+

New models released per month

2-4 wks

Time to benchmark each one

$0

Tools that auto-PR improvements

02

The Problem

  • New model every week from OpenAI, Anthropic, Google, Meta, Mistral, open-source
  • Teams manually benchmark against their stack โ€” takes days per model
  • By the time you finish testing, three more models dropped
  • No one knows if they're running the best model for their use case
03

Competitive Landscape

Company What They Do Gap
Portkey AI gateway, routing, 1600+ LLMs No auto-benchmarking against YOUR stack
Unify ($8M) Finds best LLM for the job Router-first, not benchmark-first
Braintrust ($36M, $150M val) Eval-driven development Reactive, not proactive
Us Watch โ†’ Auto-benchmark โ†’ PR when better โ€”
04

How It Works

1. Connect

Connect your AI stack. Define your eval suite (or we help you build one).

2. Watch

We monitor every model release across all providers. Automatically.

3. PR

When something beats your current setup, you get a PR with benchmarks.

The Dependabot Pattern

Watch โ†’ Auto-benchmark โ†’ PR when better

Nobody does this for AI models. We do.

05

ICP & Pricing

๐ŸŽฏ Target Customer

  • Any team with AI in production
  • 3+ AI features deployed
  • Series B+ (or well-funded Series A)
  • Engineering-led decision making

๐Ÿ’ฐ Pricing

  • Starter $2K/mo โ€” up to 5 endpoints
  • Growth $8K/mo โ€” up to 20 endpoints
  • Enterprise $20K+/mo โ€” unlimited
06

Why Now?

  • Model release velocity is accelerating โ€” impossible to keep up manually
  • LMArena proved model evaluation is a $1.7B market (raised $150M)
  • Braintrust proved enterprises pay for evals ($36M Series A)
  • Nobody has combined continuous monitoring + proactive optimization
R&D Engineer โ€ข CI/CD for AI
01

CI/CD for AI

Software engineering solved "did my change break things?" 20 years ago. AI engineering still ships blind.

๐Ÿ”ด AI Today

Push prompt change โ†’ Hope it works โ†’ Find out in production

๐ŸŸข With Us

Push prompt change โ†’ Eval runs โ†’ PR blocked if quality drops

02

The Insight

The gap isn't that people don't have evals โ€” Braintrust, Humanloop, and DSPy are giving them that.

The Real Gap

Evals aren't integrated as blocking gates in deployment pipelines the way unit tests are.

03

What We Build

GitHub Action + CI Integration

  • Automatically runs your eval suite against every PR that touches AI code
  • Prompts, model configs, RAG pipelines โ€” all covered
  • If eval score drops โ†’ PR is blocked
  • If new model improves score โ†’ PR is auto-generated

Think: Braintrust's eval engine + Dependabot's automation + GitHub Actions' CI/CD โ€” fused into one opinionated product.

04

Aemon vs Us

Dimension Aemon Us
Purpose Discover new optimal solutions Protect existing quality + incrementally improve
Posture Offensive R&D Defensive Ops
Buyer R&D Lead / ML Researcher Engineering Manager / Platform Team
Integration Standalone tool Lives in your CI/CD
05

ICP & Pricing

๐ŸŽฏ Target Customer

  • 3+ AI features in production
  • Series B+ companies
  • Engineering-led sale
  • Already using GitHub/GitLab CI

๐Ÿ’ฐ Pricing

$2K โ€“ $20K/mo

Based on eval runs & endpoints

R&D Engineer โ€ข Private LMArena
01

Private LMArena

LMArena raised $150M at $1.7B valuation on public evals. Enterprises need private evals on their own data.

$1.7B

LMArena valuation (public evals)

???

Private enterprise eval market

02

The Problem with Public Benchmarks

  • Companies have been caught gaming LMArena scores
  • Public benchmarks don't reflect YOUR use cases
  • Generic evals โ‰  production performance for YOUR data
  • Enterprises need proprietary intelligence
03

What We Build

Enterprise Model Intelligence Platform

  • Define eval suites from your production data
  • Continuously benchmark every new model release
  • Test every prompt variation, RAG config automatically
  • Output: Private leaderboard + recommended actions

Hugging Face's Yourbench is the open-source precursor โ€” but it's a DIY tool requiring significant ML expertise. We productize it.

04

Aemon vs Us

Aemon Private LMArena
Evolves novel algorithms Evaluates existing models/configs
Research Intelligence
05

ICP & Pricing

๐ŸŽฏ Target Customer

  • 10+ AI features in production
  • $50K+/mo on AI infrastructure
  • VP of Engineering or Head of AI
  • Fintech, ad-tech, e-commerce, healthtech

๐Ÿ’ฐ Pricing

$10K โ€“ $100K/mo

Enterprise contracts

R&D Engineer โ€ข AI Model FinOps
01

AI Model FinOps

Companies spend $85K+/mo on AI infrastructure. Nobody knows if they're overpaying for quality they don't need.

$85K

Avg monthly AI spend

36%

YoY growth

0

Visibility into cost-quality tradeoff

02

The Gap

Tool What It Does Missing
Portkey Routing, fallbacks No cost-quality optimization
Unify Cheapest model that meets threshold Not continuous, not production data
Us Continuously optimize cost-quality frontier across entire AI stack
03

What We Build

FinOps + Quality Optimization Layer

An agent that sits on top of your AI gateway:

  • Continuously profiles every AI call (model, cost, latency, quality)
  • Uses your production data as the eval
  • Generates actionable recommendations:
"Switch endpoint X from GPT-4o to Claude 3.5 Sonnet โ€” saves $8K/mo, quality improves 2%"

"Your RAG pipeline on endpoint Y is underperforming โ€” here's an optimized config"
04

ICP & Pricing

๐ŸŽฏ Target Customer

  • $20K+/mo on LLM APIs
  • CFO / VP Eng sale
  • Any industry with AI in production

๐Ÿ’ฐ Pricing

$2K โ€“ $15K/mo

Pays for itself from savings

โšก Easiest ROI story of all these ideas

R&D Engineer โ€ข Eval-as-a-Service
01

Eval-as-a-Service

Building good evals is harder than building the AI features themselves. We build the oracle.

02

The Insight

The Bottleneck Isn't Optimization

Braintrust's thesis: "If your eval is right, every decision becomes simple."

DSPy's framework depends on having good metrics to optimize against.

The bottleneck in the entire AI development loop is knowing what "good" looks like.

03

What We Build

Eval Generation Agent

  • Takes your production AI traces
  • Analyzes failure modes
  • Interviews domain experts (async, Slack-based)
  • Generates calibrated eval suites:

โœ“ Datasets

โœ“ Scoring rubrics

โœ“ Automated judges

Output plugs into Braintrust, DSPy, or your own CI/CD.

04

Aemon vs Us

Aemon Eval-as-a-Service
Assumes you have a good eval function Creates the eval function
Optimizer Oracle
Depends on eval quality Is the prerequisite to everything else

If you own the eval layer, you become the foundation every optimization tool depends on.

05

ICP & Pricing

๐ŸŽฏ Target Customer

  • Same as Braintrust's customers
  • AI product teams at Series B+
  • Earlier in journey โ€” before they've figured out evals

๐Ÿ’ฐ Pricing

$5K โ€“ $30K/mo

Per eval suite built + maintenance

01
LEAD PITCH: Fat Startup

AI-Powered Outcomes.
Not Tools. Not Reports.

We operate fleets of AI agents that deliver results. Customers get outcomes. We get playbooks. Playbooks become platform.

Fat Startup $4K MRR 5 Customers Playbooks Compounding

A fat startup ships outcomes, not features. It bundles software, data, and human ops into one integrated product that actually gets the job done.

โ€” Andrew Lee, a16z Speedrun Partner

02

The Shift

Spinning up AI agents is now trivial. Managing them is the new bottleneck.

What's Easy Now

๐Ÿค–

One-click agent deployment

OpenClaw, dockerized instances, cloud GPUs

๐Ÿ”€

Capable models

GPT-5, Claude 4, open-source alternatives

๐Ÿ’ฐ

Economics work

$0.01-0.10 per task, not $50/hr

What's Still Hard

โš ๏ธ

People become pseudo-IT

Babysitting agents instead of running business

โš ๏ธ

Debugging eats time

Every hour on agent issues โ‰  hour on actual work

โš ๏ธ

No one wants to manage agents

They want outcomes, not infrastructure

The Insight

Founders are too busy to become AI ops engineers. We absorb that complexity so they can focus on their actual business.

03

How We Got Here

We started in sales. Then customers kept asking for more.

๐Ÿ“ง
Started: Sales
SDR automation
โ†’
๐ŸŽฌ
Then: Video
ML training data gen
โ†’
๐Ÿ”ฌ
Then: Research
University lab assets
โ†’
๐Ÿ’ก
Pattern
We manage, they get results

The Variety We've Delivered

๐Ÿ—๏ธ

SDR for construction companies

Lead gen + qualification

๐ŸŽฌ

Video generation for ML training

Synthetic data pipelines

๐Ÿ”ฌ

Research assets for universities

Literature review + synthesis

๐Ÿš€

BDR for startups

Outbound + meeting booking

The Common Thread

Every customer had the same problem:

"I tried spinning up agents myself. Then I spent all my time debugging them instead of running my business."

โ€” Pattern across customers

They didn't want to manage AI. They wanted outcomes.

04

The Market Reality

95%
AI projects fail before production
MIT Project NANDA
70%
AI SDR users churn in 3 months
Industry data
$47K
Lost from one agent runaway
TechStartups
171%
ROI when deployment succeeds
MIT NANDA

Why Tools Aren't Enough

Companies don't want to become AI operations experts. They want someone to absorb the complexity and just deliver results.

05

The Model: Managed AI Operations

We operate agent fleets. Customers get outcomes. We encode playbooks.

๐ŸŽฏ
Customer Goal
"50 qualified meetings/month"
โ†’
โšก
Our Engineers
Configure agent fleet
โ†’
๐Ÿค–
Agent Fleet
Research, outreach, qualify
โ†’
โœ…
Outcome
Meetings on calendar

DIY / SaaS Tools

๐Ÿ› ๏ธ

You manage the agents

Become pseudo-IT for AI

๐Ÿข

Weeks to figure out

Setup, config, debugging

โ“

Hope it works

No guarantee of outcomes

OpenHolly (Us)

โœ…

We manage the agents

You focus on your business

โšก

Results in days

We've done this before (playbooks)

๐ŸŽฏ

Outcomes guaranteed

Pay for results, not effort

06

Current Focus: GTM/Sales

Starting with sales because the outcome is measurable: meetings booked.

Why Sales First

๐Ÿ“Š

Clear success metric

Meetings booked = revenue

๐Ÿ’”

Broken market

70% AI SDR churn = customers looking for alternatives

๐Ÿ’ฐ

High willingness to pay

$5-10K/month for what works

โœ…

We have traction

50% of our revenue is SDR/BDR

What We Deliver

๐Ÿ”

Research Agent

Deep prospect intelligence

โœ๏ธ

Outreach Agent

Personalized messaging

๐Ÿ“‹

Qualification Agent

Score and prioritize leads

๐Ÿ“…

Scheduling Agent

Book the meeting

Expansion Path

Sales โ†’ Research/Intel โ†’ Operations โ†’ Content. Each vertical = new playbook, same infrastructure.

07

The Unlock: Playbooks Compound

Every engagement encodes a playbook. Playbooks make the next engagement faster. This is how we build the moat.

๐Ÿ› ๏ธ
Year 1: Agency
Do the work, learn playbooks
โ†’
๐Ÿ“š
Year 2: Productize
Playbooks become templates
โ†’
๐Ÿ—๏ธ
Year 3: Platform
Others build on our templates

What's In A Playbook

Every engagement becomes encoded knowledge:

๐Ÿ“

Workflow sequences

What steps work for each use case

๐ŸŽฏ

Prompt templates

Messaging that actually converts

โš™๏ธ

Agent configurations

Which models, tools, and sequences

๐Ÿšซ

Failure patterns

What breaks and how to prevent it

The Compounding Effect

1๏ธโƒฃ

Customer 1: 2 weeks

Figure everything out from scratch

5๏ธโƒฃ

Customer 5: 3 days

Apply existing playbook + customize

๐Ÿ”Ÿ

Customer 10: Hours

Playbook is battle-tested

๐Ÿ—๏ธ

Eventually: Self-serve

Playbooks become product

The Fat Startup Advantage

We're getting paid to build our moat. Every dollar of revenue = more encoded knowledge. Competitors starting later start from zero.

08

Technical Insight

We're productizing the research consensus on what actually works.

The Research Convergence

๐Ÿ“„

Workflow-First Architecture

Declarative orchestration beats autonomous agents (Microsoft, 2024-25 surveys)

๐Ÿ‘ค

HITL as Training Signal

Human edits train intervention policies (ReHAC, EMNLP 2024)

๐ŸŽฏ

Playbooks as Optimization Surface

Prompts + tool-use are parameters to optimize (AVATAR, NeurIPS 2024)

๐Ÿ›ก๏ธ

Guardrails are Required

Transparency + oversight for multi-agent systems (Nature, 2026)

Our Implementation

โœ“

Declarative playbooks

Versioned configs, not imperative code

โœ“

Logged human checkpoints

Every edit = structured training signal

โœ“

Continuous optimization

Prompts, branching, model routing improve over time

โœ“

Action-layer guardrails

Can't be prompt-injected, auditable

We log trajectories, human edits, and outcomes, then update prompts, branching logic, and model routing so the same business objective is achieved more reliably over time. The playbook is the learned policy space.

โ€” Our technical thesis

09

The Compound Library

The internal system that makes agent workflows repeatable and efficient.

๐Ÿ”ง
Verified Tools
Tested integrations
+
๐Ÿ’ฌ
Working Prompts
By use case + vertical
+
๐Ÿง 
Model Routing
Which model where
+
๐Ÿšซ
Failure Patterns
What breaks + fixes
โ†“
๐Ÿ“ฆ
New Client Workflow
Compose from proven components

Without This System

๐Ÿ”„

Reinvent every time

Which tools? Which prompts? Which models?

๐Ÿข

Slow iteration

Learn the same lessons repeatedly

๐Ÿ“‰

Linear scaling

More clients = more eng hours

With The Compound Library

โšก

Compose from proven

Verified, tested, reusable primitives

๐Ÿ“ˆ

Each engagement adds

Learnings feed back into system

๐Ÿš€

Sublinear scaling

More clients = richer library = faster

The Compounding Effect

Workflow #1 takes a week. Workflow #10 takes a day. Workflow #100 takes hours. The library IS the moat.

10

Why Us

Team

Keith Schacht

Co-Founder + Advisor

$140M exit (Mystery) ยท a16z funded ยท Built consumer products used by millions

[CTO]

Co-Founder & CTO

CTO of Because ยท $3M Seed ยท Deep agent infrastructure experience

Yasir

Co-Founder

yapthis.com ยท Agentic architecture ยท Shipped production agent systems

Unfair Advantages

๐Ÿ•

We're running on OpenClaw

Dog-fooding our own infrastructure daily

๐Ÿ“Š

We've built observability

ClawView for agent monitoring

๐Ÿ›ก๏ธ

We've built guardrails

Agent Seatbelt for safety

๐Ÿ’ต

Revenue already

$4K MRR, +$2K this week

11

Traction

$4K
MRR
5
Customers
50%
SDR/BDR
+$2K
Added This Week

What This Proves

Companies will pay for AI-powered outcomes when someone else manages the complexity. The demand is real. The model works.

12

The Ask

What We Need

๐Ÿ’ฐ

$[X] Pre-Seed

Scale agent fleet + engineering team

๐ŸŽฏ

12-month goal: $1M ARR

Prove the playbooks at scale

๐Ÿ“š

Then: Productize

Turn proven playbooks into self-serve templates

Why Now

๐Ÿš€

OpenClaw + GPT-5 + Claude 4

Agents just became capable enough

๐Ÿ’”

AI SDR market burned

70% churn = customers looking for what works

โฐ

First-mover on playbooks

Every month we operate = more encoded knowledge

OpenHolly: AI-Powered Outcomes

Customers get results. We get playbooks. Playbooks become platform.

01
V1: Personal AI OS

Your Personal AI OS

An AI that knows your context, anticipates your needs, and takes action on your behalfโ€”not a chatbot you have to prompt.

Pre-Seed $4K MRR 5 Customers Always-On AI

The Vision

Imagine an AI that actually knows youโ€”your work, your preferences, your patterns. It doesn't wait for commands. It proactively handles tasks, flags important things, and learns from every interaction.

02

The $56B Opportunity

Personal AI assistants are about to explode.

$16B
AI Assistant Market 2024
Grand View Research
$56B
Projected by 2034
Market.us (38% CAGR)
75%
Households with AI by 2025
Gartner Forecast
72%
US Teens Use AI Companions
2025 Study

AI personal agents will arrive soon. What we do now with appsโ€”manually, and in piecemeal fashionโ€”will be done automatically. If a flight is cancelled, an AI agent will rebook the flight, reschedule meetings, and order food.

โ€” Goldman Sachs, "What to Expect from AI in 2026"

03

Why Current Assistants Fail

Siri, Alexa, and Google Assistant lost the AI race. Here's why.

โŒ The Problem

๐Ÿง 

No Persistent Memory

Context resets after 2-3 turns. They forget everything.

โธ๏ธ

Reactive, Not Proactive

Wait for commands. Never anticipate needs.

๐Ÿ”’

Siloed Knowledge

Can't connect your email, calendar, work, and life.

๐Ÿค–

Limited Actions

"I can't do that" is their signature phrase.

โœ“ Personal AI OS

๐Ÿง 

128K+ Token Context

Remembers weeks of interactions. Learns your patterns.

โšก

Proactive Intelligence

Anticipates what you need before you ask.

๐Ÿ”—

Connected Context

Sees your whole digital lifeโ€”with your permission.

๐Ÿ› ๏ธ

Real Actions

Browser, shell, files, messagesโ€”actual work gets done.

Microsoft's CEO called AI assistants "dumb as a rock." The truth is, they've stagnated while chatbots evolved.

โ€” Industry Analysis, 2023-2024

04

The Hardware Graveyard

Why dedicated AI devices keep failingโ€”and what we learned.

$699
Humane AI Pin
Flopped 2024 โ€” WIRED "Biggest Flop"
$199
Rabbit R1
"Underwhelming, underpowered" โ€” The Verge
$350M
Rewind/Limitless
Acquired by Meta Dec 2025
$2.7B
Character.AI
Google licensing deal 2024

The Lesson

Hardware failed because it created friction instead of removing it. The winning approach: software that works with your existing devicesโ€”phone, laptop, wearablesโ€”not another gadget to carry.

Both Rabbit R1 and Humane AI Pin missed a crucial opportunity: integrating with existing user bases. Why create a separate device when you could leverage smartphones and their vast ecosystem?

โ€” Medium Analysis, July 2024

05

Proactive vs. Reactive

The fundamental shift in how AI should work for you.

โธ๏ธ
Reactive AI
You ask โ†’ It responds
โ†’
โšก
Proactive AI
It notices โ†’ It acts
โ†’
๐Ÿง 
Anticipatory AI
It predicts โ†’ You approve

Reactive (Siri/ChatGPT)

"Hey Siri, add milk to my shopping list"

"ChatGPT, summarize this document"

You initiate every interaction. You remember to ask.

Proactive (Personal AI OS)

"You're almost out of milk. Added to cartโ€”confirm?"

"Your flight changed. I rebooked + rescheduled 2 meetings."

AI monitors context. Surfaces what matters. Acts with permission.

Gartner predicts 40% of enterprise apps will embed task-specific AI agents by 2026, evolving assistants into proactive workflow partners.

โ€” Forbes, "Agentic AI Takes Over," Dec 2025

06

Why Now?

Four converging forces make this the moment.

Technology Ready

๐Ÿง 

GPT-5 / Claude 4

Models finally capable of real reasoning

๐Ÿ“

128K+ Context Windows

Memory across weeks of interaction

๐Ÿ”ง

MCP + Tool Use

Agents can control apps natively

๐Ÿ’ฐ

Economics Work

$0.01-0.10 per task, not $50/hr

Market Ready

๐Ÿ“ˆ

96% Enterprise Expansion

Plan to increase agentic AI budgets

PwC May 2025 Survey
๐ŸŽฏ

25% โ†’ 50% Adoption

Enterprise GenAI agents 2025 โ†’ 2027

Deloitte Forecast
๐Ÿ˜ค

Siri Fatigue

95% frustrated with current assistants

The Manifest Survey
๐Ÿ”

Privacy Tailwinds

Apple Intelligence proves local AI demand

07

What Users Actually Want

From surveys, Reddit, and academic research.

Desires

๐Ÿง 

Memory That Persists

"Remember what I told you last week"

โšก

Proactive Help

"Remind me before I forget"

๐ŸŽฏ

Deep Personalization

"Know my preferences without asking"

๐Ÿ”

Privacy Control

"My data stays mine"

Evidence

93% of respondents predict agentic AI will enable more personalized, proactive, and predictive services.

โ€” Cisco 2025 AI Study

An assistant that knows you. The future of personal assistants is when the helper learns from your data, documents, and writing style.

โ€” AI Industry Forecast 2026

08

How It Works

Always-on AI that learns, anticipates, and acts.

๐Ÿ‘๏ธ
Observes Context
Email, calendar, browsing, work
โ†’
๐Ÿง 
Learns Patterns
Preferences, routines, priorities
โ†’
๐Ÿ’ก
Surfaces Insights
Proactive suggestions
โ†’
โœ…
Takes Action
With human approval

Current Focus: SDR/BDR

๐Ÿ”

Research Agent

Deep prospect intelligence

โœ๏ธ

Outreach Agent

Personalized messaging

๐Ÿ“…

Scheduling Agent

Meeting coordination

Platform Vision

๐Ÿ“ง

Email Intelligence

Triage, draft, follow-up

๐Ÿ“Š

Research & Analysis

Deep work on autopilot

๐Ÿ”ง

Ops & Admin

The tasks you hate, automated

09

The Unique Wedge

What makes this different from Siri/Alexa/Google Assistant?

Big Tech Assistants

๐Ÿข

Built for mass market

Generic. Lowest common denominator.

๐Ÿ“Š

Data goes to them

Your context trains their models.

๐Ÿ”’

Walled garden

Only works in their ecosystem.

โธ๏ธ

Stagnant development

Lost the AI race years ago.

Personal AI OS

๐ŸŽฏ

Built for power users

Deep personalization for serious work.

๐Ÿ”

Your data stays yours

Local-first. You control what's shared.

๐Ÿ”“

Cross-platform

Works with your existing tools.

๐Ÿš€

Cutting-edge models

GPT-5, Claude 4, always the best.

The Positioning

We're not competing with Siri for "set a timer." We're building the second brain for knowledge workersโ€”people who will pay for AI that actually makes them more effective.

10

Traction & Team

$4K
MRR
5
Customers
50%
SDR/BDR
+$2K
This Week

Keith Schacht

Co-Founder + Advisor

$140M exit (Mystery) ยท a16z funded ยท Built consumer products used by millions

[CTO]

Co-Founder & CTO

CTO of Because ยท $3M Seed ยท Deep agent infrastructure

Yasir

Co-Founder

yapthis.com ยท Agentic architecture ยท Shipped production agent systems

11

The Ask

What We Need

๐Ÿ’ฐ

$[X] Pre-Seed

Scale agent infrastructure + team

๐ŸŽฏ

12-month goal: $1M ARR

Prove the Personal AI OS at scale

๐Ÿ“š

Then: Consumer launch

Personal AI for everyone

Why This Team

๐Ÿ•

We use it daily

Dogfooding OpenClaw constantly

๐Ÿ“Š

Built observability

ClawView for agent monitoring

๐Ÿ›ก๏ธ

Built safety

Agent Seatbelt for guardrails

๐Ÿ’ต

Already have revenue

Proving demand before pitching

OpenHolly: Your Personal AI OS

An AI that knows you, anticipates your needs, and takes actionโ€”not just another chatbot waiting for prompts.

The Thesis in One Line

The shift from reactive AI to proactive AI is a $56B market. We're building the operating system for it.

01
V2: Outcome-Based Pricing

Pay Per Meeting,
Not Per Seat

The SaaS pricing model is breaking. AI does the work nowโ€”so why pay for human logins? We deliver outcomes and charge when they happen.

$4K MRR 0% Churn Outcome-Aligned 5 Customers

AI is driving a shift toward outcome-based pricing. Per-seat is no longer the atomic unit of software. If AI can handle a sizable proportion of customer support, companies will need far fewer human agents, and therefore fewer software seats.

โ€” a16z Enterprise Newsletter, December 2024

02

The Pricing Revolution

SaaS pricing is undergoing its biggest shift since the cloud. AI is killing the per-seat model.

61%
SaaS using usage-based pricing (2022)
OpenView
30%
Enterprise SaaS with outcome-based by 2025
Gartner
43%
Enterprise buyers prefer outcome/risk-share pricing
Industry Data
2-3x
Higher traction for outcome-priced AI products
BetterCloud 2025

Seat-based pricing may not fit when AI is doing the work. If an agent replaces a human task, customers will expect to pay based on outcomes, not log-ons.

โ€” Bain Technology Report 2025

03

Why Seats Are Dying

The logic of per-seat pricing breaks when AI replaces the humans who need seats.

The Broken Math

๐Ÿ“‰

AI replaces 10 analysts with 1 agent

Per-seat pricing undervalues the automation

๐Ÿ’ธ

$5-10K/month regardless of results

70% churn when outcomes don't follow

โ“

Soft ROI = death at renewal

2025 pilots hitting 2026 renewalsโ€”"are we really getting value?"

The New Model

๐ŸŽฏ

Pay for work completed

Not for access to tools

๐Ÿ“Š

ROI in their sleep

Customers calculate value instantly: $X per meeting = clear math

๐Ÿค

Aligned incentives

We only win when you win

The Bessemer Thesis

AI-native companies are abandoning seat-based SaaS pricing in favor of usage-, output-, and outcome-based models that directly align revenue with measurable results.

โ€” Bessemer Venture Partners, "The AI Pricing and Monetization Playbook" (Feb 2026)

04

Who's Already Winning

The market leaders are proving outcome-based AI pricing works at scale.

Intercom Fin

Customer Support AI

$0.99 per resolution

65% resolution rate. Aligns every team around one outcome: resolved tickets. Now deployed at 99% of conversations.

Zendesk AI Agents

Customer Support AI

Outcome-based pricing

"First in CX industry to offer outcome-based pricing for AI agents" โ€” August 2024 announcement.

EvenUp

Legal AI

Per demand package

AI + legal experts generate personal injury demand letters. Per output pricing, not hourly.

Decagon

Enterprise AI Support

Per-conversation + per-resolution

Hybrid model. Usage (conversations) + outcome (resolutions). Featured in a16z podcast.

Leena AI

Employee Support AI

ROI-based (tickets closed)

Shifted from consumption โ†’ outcomes. Customers gained clearer ROI, business accelerated.

Scale AI

Data Labeling โ†’ Platform

$13.8B valuation

Started as labeling services. Became infrastructure. Services โ†’ outcomes โ†’ platform.

The Pattern

Every major AI-native company is moving toward outcome-based pricing. This isn't experimentationโ€”it's convergence.

05

Why Enterprises Love It

43% of enterprise buyers consider outcome-based pricing a significant factor in purchase decisions.

Buyer Psychology

๐Ÿงฎ

Instant ROI Calculation

"$X per meeting booked" = CFO-ready math. No spreadsheet gymnastics.

๐Ÿ›ก๏ธ

Zero Implementation Risk

If it doesn't work, you don't pay. Risk transferred to vendor.

๐Ÿ“ˆ

Scales With Value

More meetings = more spend = more value captured. Natural expansion.

๐Ÿ”„

No Renewal Anxiety

You're paying for results. Why churn from something that works?

What Buyers Say

"Why should we pay $X per user if we could pay $Y per outcome? Aligning price with realized value improves the ROI calculus."

โ€” Enterprise buyer sentiment (Industry research)

"The fundamental shift is to stop charging for access and start charging for work done."

โ€” Bain Technology Report 2025

Deloitte 2026 Prediction

"Outcome- or value-based pricing is based on the real business results that SaaS applications with AI agents produce. There will be a gradual move toward a future powered by integrated, autonomous multi-agent systems."

06

Our Model: Pay Per Meeting

We operate AI agent fleets that book qualified sales meetings. You pay only when meetings happen.

๐ŸŽฏ
Define Outcome
"50 qualified meetings/month"
โ†’
๐Ÿค–
Agent Fleet Works
Research, outreach, qualify, book
โ†’
๐Ÿ“…
Meeting Booked
Verified on calendar
โ†’
๐Ÿ’ฐ
You Pay
Only for outcomes

โŒ Traditional AI SDR

$5-10K
/month regardless of results
70%
Churn in 3 months
???
ROI unclear, hard to justify

โœ“ OpenHolly Outcome Model

$250-500
Per qualified meeting booked
0%
Risk if agents don't perform
โˆž
ROI: only pay when it works
07

Unit Economics That Work

Outcome-based pricing isn't charityโ€”it's better economics for everyone.

Our Economics

๐Ÿ’ต

$250-500 per meeting

Customer pays on outcome

๐Ÿค–

$30-80 cost to deliver

AI compute + tooling + human oversight

๐Ÿ“ˆ

3-7x margin

Healthy unit economics, scales with volume

๐Ÿ”„

Playbooks compound

Each meeting โ†’ better templates โ†’ lower cost

Customer Economics

โœ“

Meeting = $5K-50K deal potential

$250-500 per meeting is a no-brainer

โœ“

Zero upfront commitment

Start small, scale with proof

โœ“

Budget predictability

Cost tracks linearly with value

โœ“

Easy internal approval

CFO loves outcome-based spend

The Intercom Lesson

"Intercom's $0.99 per resolution aligns every team around one outcome: resolved tickets. If Fin resolves a ticket in three messages or thirty, the customer pays the same. The risk is realโ€”but the reward is equally real: customers know exactly what they're getting, and they can calculate ROI in their sleep."

โ€” Bessemer, Feb 2026

08

Managing the Risks

Outcome-based pricing has real risks. Here's how we mitigate them.

The Risks

โš ๏ธ

Cost variability

Some meetings cost more than others

โš ๏ธ

Revenue unpredictability

Customer usage varies month to month

โš ๏ธ

Attribution disputes

"Did your AI really book this?"

โš ๏ธ

Abuse potential

Customers gaming the system

Our Mitigations

โœ“

Minimum commitments

Base retainer + outcome fees = floor

โœ“

Playbook compounding

Cost per outcome drops with scale

โœ“

Clear outcome definitions

Contractually defined: what counts

โœ“

Full audit trail

Every action logged, no disputes

Industry Standard Emerging

"Agreements around basic definitions for things like 'an agent,' 'a task,' 'a process,' 'an interaction,' and 'an outcome' should be clearly defined, communicated, and agreed upon contractually." โ€” Deloitte TMT Predictions 2026

09

Traction

$4K
MRR
5
Customers
0%
Churn
+$2K
Added This Week

Why Zero Churn

๐ŸŽฏ

Aligned incentives

They pay for results โ†’ they get results โ†’ no reason to leave

๐Ÿ“ˆ

Clear value

Every invoice shows exactly what they got

๐Ÿ”„

Natural expansion

"It's workingโ€”give me more"

Customer Mix

๐Ÿ—๏ธ

50% SDR/BDR

Our wedge: sales meetings

๐ŸŽฌ

30% Video/ML

Synthetic data pipelines

๐Ÿ”ฌ

20% Research

University lab assets

When you only pay for results, there's no reason to churn. Aligned incentives = sticky customers. This is why Intercom's outcome-based Fin has 99% deployment.

10

Team

Keith Schacht

Co-Founder + Advisor

$140M exit (Mystery) ยท a16z funded ยท Built consumer products used by millions

[CTO]

Co-Founder & CTO

CTO of Because ยท $3M Seed ยท Deep agent infrastructure experience

Yasir

Co-Founder

yapthis.com ยท Agentic architecture ยท Shipped production agent systems

Unfair Advantages

๐Ÿ•

Dog-fooding daily

Running on OpenClaw infrastructure

๐Ÿ“š

Playbooks compounding

Every engagement โ†’ better templates

Why Outcome-Based Wins

๐Ÿ’ฐ

We absorb the risk

Customers love it โ†’ lower CAC, zero churn

๐ŸŽฏ

We're incentivized to deliver

Better AI = more margin for us

11

The Thesis

You post a bounty: "$500 per meeting booked." AI agents compete. Whoever performs best gets paid. We already do this with bug bounties, Kaggle, hackathons. Why not for AI agents?

โ€” Macy Mills, a16z Speedrun Partner

Why Now

๐Ÿ“ˆ

Market timing

61% โ†’ 30%+ outcome-based adoption wave

๐Ÿ’”

AI SDR burnout

70% churn = customers looking for what works

๐Ÿข

Enterprise demand

43% prefer outcome-based pricing

Comparable Outcomes

๐Ÿš€

Scale AI: $13.8B

Services โ†’ outcomes โ†’ platform

๐Ÿ“Š

Pilot: $1.2B

Bookkeeping outcomes, not seats

๐Ÿ’ฌ

Intercom Fin

$0.99/resolution, 99% deployment

OpenHolly: Pay Per Outcome

AI agents that deliver results. You only pay when they do. The future of how work gets priced.

๐Ÿ“š Sources

a16z Enterprise Newsletter (Dec 2024) โ€ข Bessemer "AI Pricing Playbook" (Feb 2026) โ€ข Bain Technology Report 2025 โ€ข Deloitte TMT Predictions 2026 โ€ข OpenView SaaS Benchmarks โ€ข Gartner โ€ข EY "SaaS Transformation with GenAI" (Nov 2025) โ€ข BetterCloud "AI and SaaS Industry 2026" โ€ข Intercom Fin pricing page โ€ข Zendesk AI Agents announcement (Aug 2024)

01
V3: Anti-AI-SDR

The $500M AI SDR Market
Is Imploding. We're the Fix.

50-70% churn rates. LinkedIn bans. Domain blacklists. The "autonomous AI SDR" thesis failed. Human-in-the-loop is winning.

50-70%
AI SDR Churn Rate
Common Room, Feb 2025
$7.5K
Spent for 1 Demo
Reddit r/SaaS, Dec 2025
0
Sales from AI SDR Leads
Theory Ventures CRO
80%+
Human-in-Loop Success
MarketBetter G2: 4.97/5
02

The AI SDR Disaster: Real Data

"AI SDRs don't workโ€”biggest bubble in tech." โ€” LinkedIn comment with 400+ likes

๐Ÿ’€ What's Actually Happening

"Their AI continuously hallucinated, getting things wrong about what my company does, the industry we are in, what products we sell. 1 positive reply, 1 demo, thousands of prospects touched, $7.5K down the drain."

โ€” r/SaaS, Dec 2025

"A CRO from a publicly traded company disclosed that while an AI SDR helped generate a substantial volume of leads over a nine-month period, it did not lead to actual sales."

โ€” Tomasz Tunguz, Theory Ventures

"Reports emerged of Artisan accounts, including those of team members and founders, facing restrictions or bans for suspected spam and automation violations."

โ€” Quasa.io, Jan 2026

๐Ÿ“Š The Numbers Don't Lie

๐Ÿ“‰

50-70% Annual Churn

2x the churn of human SDRs (a role notorious for turnover) โ€” Common Room

๐Ÿšซ

LinkedIn Bans Spreading

Platform ramped up AI detection, restricting automation-heavy accounts

๐Ÿ“ง

Domain Blacklisting

Gmail filtering harshened. Sender reputations destroyed in weeks.

โš–๏ธ

Legal Exposure

GDPR fines up to 4% revenue. TCPA: $500-1,500 per message.

๐Ÿ’”

Brand Damage

"Permanent brand damage from being publicly associated with spam" โ€” NUACOM

03

Even VCs Are Calling It

TechCrunch: "AI sales rep startups are booming. So why are VCs wary?"

"When one studies any of these startups individually, it's like 'wow, that's stunning product market fit.' When all 10 of them have stunning product market fit, it's hard to answer 'How is that going to play out?'"

โ€” Shardul Shah, Partner, Index Ventures (hasn't invested)

"Without access to differentiated data, AI SDR startups risk being overtaken by incumbents like Salesforce, HubSpot, and ZoomInfo."

โ€” Chris Farmer, CEO, SignalFire

"Investors are not surprised by the rapid adoption of AI SDRs; they are just doubting that adoption is sticky."

โ€” TechCrunch, Dec 2024

The Jasper Cautionary Tale

$1.5B โ†’ 30% Layoffs

Jasper, the AI copywriting unicorn, ran into speed bumps and had to lay off 30% of staff after ChatGPT launched. AI SDRs face the same commoditization risk.

Why Adoption Isn't Sticky

1

Garbage In, Garbage Out

Built on commoditized LinkedIn data = undifferentiated output

2

Ops is Afterthought

Black boxes that create more work, not less

3

Feature, Not Product

Incumbents (Salesforce, HubSpot) can bundle this free

04

The Fundamental Flaw: Autonomous โ‰  Better

"The AI SDR is dead, long live the AI SDR: How the future is Human-in-the-Loop"

โŒ Why Autonomous Fails

๐Ÿค–

No Emotional Intelligence

Can't read tone, context, or cultural nuance essential in enterprise sales

๐ŸŽฏ

No Real Consent

Scraped data without consent โ†’ GDPR/CCPA violations

โš–๏ธ

No Accountability

When AI misleads, your company bears the liability

๐Ÿ”„

Volume Over Value

"More volume on a bad message is not a strategy. It is self-sabotage."

๐Ÿ‘ป

Fake Personalization

"Commenting on someone's hoodie feels forced because it's a hollow observation"

โœ“ What Actually Works

"Teams that use AI to support human insight consistently outperform teams trying to replace humans entirely. It's not even close."

โ€” Matthew Metros, The AI SDR is Dead

๐Ÿ”

AI Does Research (90%)

Data mining, signal detection, prospect prioritization

๐Ÿ‘ค

Humans Do Relationships (10%)

Judgment, trust, closing

โœ…

Human-in-Loop = Higher Ratings

MarketBetter (human oversight): 4.97/5 G2 rating

๐Ÿ“ˆ

Better Outcomes

"Human-in-the-loop platforms consistently outperform fully autonomous ones"

05

OpenHolly: The Anti-AI-SDR

We're not building another AI SDR. We're building what should have been built from the start.

โŒ 11x / Artisan / AiSDR

๐Ÿค–

Replace human judgment

"Autonomous AI employee"

๐Ÿ“ง

Optimize for volume

"6,000 contacts/month"

๐Ÿ’ฐ

Per-seat pricing

$5-10K/mo regardless of results

๐Ÿ“ฆ

You manage the tool

Become pseudo-IT for AI

๐ŸŽฐ

Hope it works

No outcome guarantees

โœ“ OpenHolly

๐Ÿ‘ค

Augment human judgment

AI research + human checkpoints

๐ŸŽฏ

Optimize for quality

Right message, right person, right time

๐Ÿ’ต

Outcome-aligned pricing

Pay for meetings, not seats

๐Ÿ› ๏ธ

We manage the agents

You focus on your business

โœ…

Results guaranteed

Outcomes or you don't pay

06

How OpenHolly Works

AI handles the research. Humans make the decisions. You get meetings.

๐ŸŽฏ
Your Goal
"50 qualified meetings/mo"
โ†’
๐Ÿ”
AI Research
Signals, intent, fit scoring
โ†’
๐Ÿ‘ค
Human Checkpoint
Review & approve outreach
โ†’
โœ๏ธ
AI Execution
Send, follow-up, schedule
โ†’
๐Ÿ“…
Meeting Booked
Qualified, on calendar

What AI Handles (90%)

๐Ÿ”

Deep Prospect Research

Intent signals, company news, technographics, pain points

๐Ÿ“Š

Lead Scoring & Prioritization

Who to contact and why, right now

โœ๏ธ

Draft Generation

Personalized outreach based on real signals

๐Ÿ“ง

Multi-channel Execution

Email, LinkedIn (safely), follow-ups

What Humans Handle (10%)

โœ…

Approval Gates

Review before sending to high-value prospects

๐Ÿ’ฌ

Live Conversations

When a prospect engages, humans take over

๐ŸŽฏ

Strategy & ICP

Define who you want to reach and why

๐Ÿง 

Judgment Calls

Edge cases, sensitive prospects, brand protection

07

The Market Opportunity: Fix AI SDR

Their 50-70% churn is our customer acquisition channel.

$500M+
Raised by AI SDR startups
11x, Artisan, AiSDR, etc.
50-70%
Will churn this year
Common Room data
$250M+
Churned customers/year
Market opportunity
Human-in-Loop
What they'll switch to
The thesis

The Churned Customer Profile

๐Ÿ’”

Burned by AI SDR tools

Spent $5-10K/mo, got spam complaints

๐Ÿ“ง

Domain reputation damaged

Need to rebuild sender trust

๐Ÿ˜ค

Still need meetings

The problem didn't go away

๐ŸŽฏ

Now understand quality > volume

Educated by failure

Why They'll Choose Us

โœ…

Outcome-based pricing

Only pay for meetings that happen

๐Ÿ›ก๏ธ

Brand protection

Human oversight prevents embarrassments

๐Ÿ“Š

Proven playbooks

We've learned what works across verticals

๐Ÿค

We absorb the complexity

They don't manage agents, they get results

08

Traction: The Thesis Is Working

$4K
MRR
5
Customers
0%
Churn
+$2K
Added This Week

Why Zero Churn

Aligned Incentives

When customers only pay for results, there's no reason to churn. If we don't deliver meetings, they don't pay. Simple.

vs. AI SDR Churn

AI SDRs charge $5-10K/mo whether or not they work. When they don't deliver, customers leave. Misaligned incentives = 50-70% churn.

Team

Keith Schacht

Co-Founder + Advisor

$140M exit (Mystery) ยท a16z funded ยท Built consumer products used by millions

[CTO]

Co-Founder & CTO

CTO of Because ยท $3M Seed ยท Deep agent infrastructure

Yasir

Co-Founder

yapthis.com ยท Agentic architecture ยท Production agent systems

09

The Ask

What We Need

๐Ÿ’ฐ

$[X] Pre-Seed

Scale human oversight operations + agent infrastructure

๐ŸŽฏ

12-month goal: $1M ARR

Prove the anti-AI-SDR thesis at scale

๐Ÿ“š

Then: Productize

Turn proven playbooks into self-serve platform

Why Now

๐Ÿ’”

AI SDR market imploding

50-70% churn = massive displaced customer base

๐Ÿ“ˆ

Human-in-loop proven

Highest G2 ratings go to human-oversight tools

โฐ

First-mover on "fix"

Position as the safe alternative before market consolidates

OpenHolly: The Anti-AI-SDR

AI SDRs promised automation. They delivered spam, bans, and brand damage. We deliver meetings โ€” with human judgment where it matters. Their 50-70% churn is our customer acquisition channel.

๐Ÿ“š Sources

Common Room "The AI SDR is dead" (Feb 2025) ยท TechCrunch "AI sales rep startups are booming. So why are VCs wary?" (Dec 2024) ยท Reddit r/SaaS AI SDR complaints ยท Quasa.io Artisan LinkedIn bans (Jan 2026) ยท Pipeline Group "Hidden Dangers of AI SDRs" ยท Theory Ventures SaaStr Talk ยท MarketBetter G2 Reviews

01
V5: Agent Seatbelt

The Safety Layer
Before AI Gets the Keys

Browser-layer guardrails that block irreversible AI actions before they happen.

$47K
Lost in one AI runaway
84%
Have zero safety boundaries
3am
When agents go rogue
100%
Preventable with guardrails
02

The "$39K Gone in a Blink" Problem

AI agents fail not from bad models, but from bad guardrails. 84% of companies deploying agents have zero safety boundaries defined.

โ€” GenDigital Agent Trust Hub Research, 2026

What Goes Wrong

๐Ÿ’ธ

Runaway API costs

$47K overnight cloud bills

๐Ÿ“ง

Wrong recipients

AI SDR emails competitors

๐Ÿ—‘๏ธ

Irreversible actions

Deleted production data

๐Ÿ”“

Credential leaks

Pricing sent to wrong channel

What We Block

โœ“

Site-specific rules

Block LinkedIn "Follow" for AI SDRs

โœ“

Action classification

Read vs. Write vs. Irreversible

โœ“

Human approval gates

Require confirmation for risky ops

โœ“

Rate limiting

Prevent runaway loops

03

How It Works

Chrome extension that intercepts agent browser actions

๐Ÿค–
Agent Action
โ†’
๐Ÿ›ก๏ธ
Seatbelt Intercept
โ†’
โš–๏ธ
Risk Classification
โ†’
โœ…
Allow / Block / Human

Why Browser Layer

Framework-agnostic. Works with any AI agent (OpenClaw, LangChain, AutoGen, custom). Install once, protect everything.

04

Market & Competitive Position

Why Now

๐Ÿ“ˆ

OpenClaw: 9K โ†’ 60K stars

Autonomous agents exploding

โš ๏ธ

CyberArk security concerns

Enterprise worried about agent security

๐Ÿ“œ

EU AI Act

Regulatory tailwinds for safety

Competition

๐ŸŸก

GenDigital Agent Trust Hub

Just launched - validates market

๐ŸŸข

Our Angle

Browser-layer = framework-agnostic

๐ŸŸข

MVP Achievable

Chrome extension ships fast

Agent Seatbelt

The seatbelt you install before giving AI the keys.

๐Ÿ”— Supports These Pitches

Fat Startup โ€ข AWS of AI Work โ€ข Control Plane

Part of the human oversight layer that makes agent work reliable.

01
V6: ClawView

Datadog for
Autonomous Agents

When your AI employee sends the wrong email at 3am, you'll know exactly why.

The Problem

Companies are deploying autonomous AI agents that run 24/7. When something goes wrongโ€”and it willโ€”they have no idea why. Current tools are built for request-response, not proactive agents.

02

Current Tools Miss Autonomous Agents

LangSmith / Langfuse / Arize

โŒ

Request-response patterns

User sends message, LLM responds

โŒ

Chain tracing

LangChain-specific, not agent-native

โŒ

No proactive agent support

Built for chatbots, not employees

ClawView

โœ“

Autonomous operation

24/7 agents taking proactive actions

โœ“

Decision tracing

Why did it make that choice?

โœ“

Multi-channel + tools

Shell, browser, files, messages

03

The "Oh Shit" Demo

๐Ÿค–
Agent receives task
โ†’
๐Ÿง 
Makes decisions
โ†’
๐Ÿ’ฅ
Something goes wrong
โ†’
๐Ÿ”
ClawView shows why

Without ClawView

"The agent sent the wrong email. Logs show it ran. No idea why."

With ClawView

"Step 3: Agent assumed X because of context Y. Here's how to prevent this class of error."

ClawView: See What Your Agents Actually Do

Every decision. Every action. Every assumption. Full causal tracing.

๐Ÿ”— Supports These Pitches

Fat Startup โ€ข AWS of AI Work โ€ข Control Plane

Observability layer โ€” see what agents are doing before they go wrong.

โš ๏ธ Why This is a Feature, Not a Company

Langfuse, LangSmith, Arize are well-funded. But none are built for autonomous agents. ClawView is our internal observability layer, not a separate product pitch.

01
V7: AgentGov

Governance for
AI Employees

Audit trails. Approval workflows. Compliance automation. The control layer enterprises need.

84%
No safety boundaries
0
Audit trails today
EU AI Act
Compliance required
2026
Enforcement begins
02

The Governance Gap

AI agents fail not from bad models, but from bad guardrails. The unlock isn't better agentsโ€”it's better safety rails.

โ€” Industry consensus, 2026

What's Missing

โŒ

No audit trails

What did the agent do at 3am?

โŒ

No approval workflows

High-stakes actions go unsupervised

โŒ

No compliance framework

EU AI Act enforcement coming

โŒ

No agent-on-agent supervision

Humans can't supervise at machine speed

AgentGov Provides

โœ“

Immutable audit trails

Every action, every decision, timestamped

โœ“

Approval workflows

Human gates for high-stakes actions

โœ“

Compliance automation

EU AI Act ready, audit reports generated

โœ“

AI supervision layer

Validator agents checking worker agents

03

From "Human in Loop" to "Human on Loop"

๐Ÿ‘ค
Human IN Loop
Approve every action
โ†’
๐Ÿ‘๏ธ
Human ON Loop
Exception handling
โ†’
๐Ÿ›๏ธ
Human ABOVE Loop
Strategic oversight

McKinsey Insight

"Organizations are moving from human in the loop to human on the loopโ€”above the loop for strategic oversight." AgentGov enables this transition safely.

AgentGov: Govern AI at Scale

Audit trails. Approval workflows. Compliance automation. Trust at machine speed.

๐Ÿ”— Supports These Pitches

Fat Startup โ€ข AWS of AI Work โ€ข Control Plane

Governance + compliance layer โ€” enables enterprise trust.

๐Ÿ”ฌ Key Research

Gravitee 2026: Only 14.4% have full security approval for agents. 88% reported incidents.
EU AI Act: Enforcement begins 2026, mandates audit trails.
Zenity: $38M Series B validates market (but they're low-code focused, not agent-native).

01
V8: AI Employee OS

The Full Stack for
AI Employees

10 layers an AI employee needs to fulfill an entire job description. We're building the unified platform.

The Thesis

An AI employee's value lies in performing EVERYTHING in a job descriptionโ€”not just one workflow. This requires a complete infrastructure stack.

02

The 10-Layer Stack

1Memory & Personality
2Skills & Capabilities
3Tools & Integrations
4Identity & Access
5Objectives & Goals
6Task Management
7Work Artifacts & KB
8Supervision & Oversight โญ
9Communication (A2A) โญ
10QA & Compliance โญ

What's Missing (โญ)

Layers 8-10 are the critical gaps. Everyone's building capabilities. Nobody's building supervision, agent-to-agent communication, and compliance.

03

The Integration Problem

Current landscape is fragmented

Today: Point Solutions

๐Ÿ“ฆ

Memory: Mem0, Zep, LangMem

๐Ÿ“ฆ

Tools: MCP servers

๐Ÿ“ฆ

Identity: Okta, 1Password

๐Ÿ“ฆ

Tasks: LangGraph, CrewAI

๐Ÿ“ฆ

Compliance: Guardrails AI, Trail

Tomorrow: AI Employee OS

A unified platform that manages the full AI employee lifecycle.

โœ“

Integrated stack

All 10 layers, one platform

โœ“

Turnkey deployment

Job description โ†’ Working AI employee

โœ“

Enterprise governance

Built-in compliance, audit, oversight

AI Employee OS

The unified platform for deploying, managing, and governing AI employees.

๐Ÿ”— Framework For These Pitches

Fat Startup โ€ข AWS of AI Work โ€ข Control Plane

The 10-layer framework is how we think about what AI employees need.

โš ๏ธ Why This is a Framework, Not a Pitch

Building all 10 layers is massive. We focus on Layers 8-10 (supervision, communication, compliance) because that's the critical gap. The framework informs strategy, not the pitch itself.

01
V9: AgentDocs

Stack Overflow
for AI Agents

Verified working code. Real benchmarks. Pay-per-snippet micropayments. Documentation that actually works.

200x
Slower (Whisper vs Groq)
Garry Tan, YC Feb 2026
โˆž
Hallucinated APIs
$2.2M
30-day x402 volume
x402scan.com
0
Verified snippet services

Claude Code chose Whisper V1 โ€” near-deprecated โ€” over Groq (200x faster, 10x cheaper) because OpenAI's docs are cleaner. Agents pick tools by doc quality, not performance.

โ€” Garry Tan, YC Partner, Feb 2026

Even the best dev tools don't let you sign up via API. This is a big miss in the claude code age โ€” claude can't sign up on its own.

โ€” Jared Friedman, YC Partner, Feb 2026

02

The Hallucination Tax

Despite our best efforts, they will always hallucinate. That will never go away.

โ€” Amr Awadallah, Vectara CEO, 2026

โŒ The Problem

โŒ

Best-documented โ‰  Best solution

Agents pick whatever has most examples

โŒ

Documentation gets stale

APIs change, snippets break

โŒ

No verification

Agent can't know if code actually runs

โŒ

No benchmarks

No cost/perf data to guide decisions

โœ“ AgentDocs

โœ“

Agent-swarm verified

Code tested continuously, timestamped

โœ“

Use-case organized

"Transcribe video" โ†’ 10 services compared

โœ“

Real benchmarks

Cost, latency, quality scores

โœ“

x402 micropayments

$0.05 per verified snippet

03

How It Works

๐Ÿค–
Agent needs code
"Send email via API"
โ†’
๐Ÿ”
Query AgentDocs
Structured API
โ†’
๐Ÿ’ณ
HTTP 402
Pay $0.05 via x402
โ†’
โœ…
Verified snippet
Tested 2 hours ago

Kill the API Key

No signup. No rate limits. No accounts. Agent pays per-request, gets verified code. Native to how agents want to consume services.

Launch Order (by x402 + Pain Score)

1

๐ŸŽ™๏ธ Transcription โ€” NOW

Groq, Deepgram, Whisper. Zero x402 servers. Garry Tan moment.

2

๐ŸŽฌ Video Gen โ€” Dogfood

Kling, Runway, Wan. Parameter chaos unsolved.

3

๐Ÿง  LLM Routing

Model selection based on task + budget

4

๐Ÿ“ง Agent Identity

Email + phone + wallet in one API

What Agents Get

โœ“

Working code snippet

Verified against real APIs

โœ“

Normalized output schema

Same format across providers

โœ“

Cost + latency benchmarks

Real numbers, updated hourly

โœ“

Routing recommendation

"For fast+cheap โ†’ use Groq"

04

Market & Competition

Closest Competitor: Context7

๐ŸŸก

Up-to-date docs

โœ“ They have this

๐ŸŸก

Version-specific

โœ“ They have this

โŒ

Verified working

No continuous testing

โŒ

Benchmarks

No cost/perf data

โŒ

Micropayments

Free only, no agent-native billing

Why Now

๐Ÿ’ฐ

x402 is production-ready

$43M+ processed, 35M+ txns

๐Ÿค–

Agent adoption exploding

OpenClaw: 9Kโ†’60K stars

๐Ÿ“ˆ

$50B market by 2030

AI agent infrastructure

๐ŸŽฏ

Clear wedge

Verification is table stakes soon

The x402 Thesis

25,000+ developers building on x402. Google, Cloudflare, Stripe adopting. Machine-to-machine payments are the rails for agent economy.

05

x402 Market Opportunity

Real-time data from x402scan.com shows a booming agent economy โ€” with a clear gap for developer tooling.

$2.21M
30-Day x402 Volume
x402scan.com, Feb 2026
4.2M
Transactions (30 days)
~140K/day average
8,559
Active Buyer Agents
Coinbase facilitator alone
0
Verified Snippet Services
Gap in the market

All 14 Facilitators

Facilitator 30d Txns 30d Vol What They Do
Dexter 1.65M $79.5K Agent economy platform
Coinbase 722K $288.5K Official CDP facilitator
Virtuals Protocol 412K $1.34M AI agent tokenization
PayAI 1.31M $43.3K Micropayments
RelAI 66K $84K Agent payments (Solana)
Meridian 19K $315K High-value transactions
Thirdweb ~10K ~$2K Web3 dev platform
OpenX402 6.6K $38.6K Open-source facilitator
Polymer 6.4K $770 Proof generation
AnySpend ~3K ~$5K Multi-asset spending

+ Corbits, OpenFacilitator, CustomPay, AgentPay (emerging)

Source: x402scan.com, Feb 27 2026

Market Gap Analysis

๐Ÿ”

What Exists

Data APIs, AI services, crypto tools, social data

โŒ

What's Missing

Verified code snippets, curated docs, developer knowledge

๐Ÿ’ก

AgentDocs Opportunity

Be the Stack Overflow layer on x402 rails

Why We Can Win

Top services (StableEnrich, LowPaymentFee) aggregate APIs โ€” they don't verify code quality.
AgentDocs: Premium pricing ($0.05-0.10) justified by verification + benchmarks.
Target: 1,000+ requests/day = $2,100+/month revenue from agent micropayments alone.

06

Revenue Model

AgentDocs: Documentation That Works

Verified snippets. Real benchmarks. Agent-native payments. Stack Overflow, but for machines.

๐Ÿ”— Supports These Pitches

Fat Startup โ€ข AWS of AI Work

Better documentation โ†’ better agent outputs โ†’ more reliable outcomes.

๐Ÿ“ Current Progress

Live: agentdocs-api.holly-3f6.workers.dev
Snippets: 15 use cases, 21 verified snippets
Status: Dogfooding internally, expanding library

01
PORTAL

Autonomous Service
Signup for Agents

AI agents can write code, deploy apps, and manage infrastructure. But they can't sign up for a Stripe account. We fix that.

Even the best developer tools mostly still don't let you sign up for an account via API. This is a big miss in the claude code age because it means that claude can't sign up on its own. Putting all your account management functions in your API should be table stakes now.

โ€” Jared Friedman, YC Partner, Feb 27 2026

181
Replies to Jared's tweet
1,336
Likes in 12 hours
0
Solutions today
$0.50-2
Per signup (x402)
02

The Problem: Last Mile of Agent Autonomy

โœ“ What Agents CAN Do

โœ“

Write entire codebases

โœ“

Deploy to staging

โœ“

Run tests, fix bugs

โœ“

Manage infrastructure

โœ— What Agents CAN'T Do

โœ—

Sign up for Stripe

โœ—

Create a Vercel account

โœ—

Get an API key from Resend

โœ—

Click "Verify Email"

Hit this exact wall last week. Claude Code can scaffold an entire project, write tests, deploy to staging, but needs me to manually sign up for a third party service and paste in an API key. The last mile of developer tooling is still stuck in 2019.

โ€” @advikjain_, replying to Jared

03

Community Validation

What developers said in response to Jared's tweet

"This is a real friction point for agentic workflows. The auth layer is always manual. Companies that figure out API-first account provisioning will eat the ones stuck in dashboard-only onboarding."

โ€” @thebasedcapital

"I've watched AI tools fail at basic integration tasks because they hit the 'create account manually' wall. We're debating whether Claude can replace junior devs but it can't even sign up for Stripe."

โ€” @OneManSaas

"Signup is just the tip. Billing, permissions, onboarding โ€” everything assumes a human in a UI. Devtools that go full API-first for the entire lifecycle get a massive edge when agents pick their own stack."

โ€” @wildpinesai (tagging @paulg)

"Bigger issue than just signup. Most SaaS still treats APIs as a feature for power users, not the primary interface. When your biggest customer is an agent, the whole product surface needs to be API-first."

โ€” @twitter user

The Skeptics (and why they're wrong)

"Won't this enable bot spam?" โ€” Valid concern, but x402 payments solve this. Agents pay real money per signup. Spam bots won't pay $1 per account.
"Companies don't want bot signups" โ€” They want PAYING customers. Agent-initiated signups that convert to revenue are valuable.

04

How Portal Works

๐Ÿค–
Agent Request
"I need Vercel access"
โ†’
๐Ÿ’ณ
x402 Payment
$1.00 USDC
โ†’
๐Ÿšช
Portal Queue
Job ID + poll URL
โ†’
๐Ÿ–ฅ๏ธ
Worker Fleet
Browser automation
โ†’
๐Ÿ”‘
Credentials
API key + password

API Flow

POST /signup
{ "service": "vercel" }

โ†’ 201 Created
{
  "job_id": "portal_abc123",
  "poll_url": "https://...",
  "estimated_seconds": 30
}

What Agent Receives

GET /credentials/portal_abc123

{
  "api_key": "vercel_xxx",
  "email": "agent-abc@portal...",
  "password": "encrypted...",
  "account_url": "https://..."
}
05

Email Modes

๐Ÿ  Portal-Managed

We provision agent-{id}@portal.viewholly.com

  • We handle email verification automatically
  • No email infrastructure needed from agent
  • Simplest path โ€” just call the API
{ "email_mode": "portal_managed" }

๐Ÿ“ง Agent-Provided

Agent brings their own email (AgentMail, etc.)

  • Agent controls the identity
  • Integrates with existing email service
  • Agent must forward verification emails
{ "email_mode": "agent_provided",
  "agent_email": "bot@agentmail.com" }
06

x402 Market Opportunity

Agent payments are live. Portal fits perfectly.

$2.2M
30-day x402 volume
x402scan.com
4.2M
Transactions (30 days)
513
Active merchants
0
Signup services

What Exists on x402

โœ“

Data APIs

StableEnrich, httpay

โœ“

AI Services

Virtuals ACP ($163K/day)

โœ“

Social Data

StableSocial, TweetX402

โœ“

Email for Agents

StableEmail (314 txns)

What's Missing

0

Account Signup Services

Nobody solving this

0

API Key Provisioning

Wide open

0

Identity + Onboarding

Jared's exact point

07

Services & Pricing

Service Complexity Price Est. Time Status
Resend Simple $0.50 20s MVP
Railway Simple $0.50 25s MVP
Vercel Email verify $1.00 30s Week 2
Supabase Email verify $1.00 35s Week 2
Cloudflare Email verify $1.00 30s Week 2
Stripe 2FA / Complex $2.00 60s Phase 2

Revenue Model

1,000 signups/day ร— $1 avg = $30K/month
Infrastructure cost: ~$500/month (workers + CF)
Gross margin: 98%

08

Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                         AGENT                                โ”‚
โ”‚              (Claude Code, OpenClaw, any AI)                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                            โ”‚
                            โ”‚ POST /signup (x402 $1)
                            โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                  PORTAL API (CF Workers)                    โ”‚
โ”‚          Hono + @x402/hono + D1 job queue                   โ”‚
โ”‚              Returns job_id in <100ms                        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                            โ”‚
                            โ”‚ Workers poll
                            โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚               WORKER FLEET (OpenClaw Instances)             โ”‚
โ”‚    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”         โ”‚
โ”‚    โ”‚Worker 1โ”‚  โ”‚Worker 2โ”‚  โ”‚Worker 3โ”‚  โ”‚Worker 4โ”‚  (4+)   โ”‚
โ”‚    โ”‚Browser โ”‚  โ”‚Browser โ”‚  โ”‚Browser โ”‚  โ”‚Browser โ”‚         โ”‚
โ”‚    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜         โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                            โ”‚
                            โ”‚ Encrypted credentials
                            โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                CREDENTIAL VAULT (KV)                        โ”‚
โ”‚         One-time retrieval โ€ข 5-min TTL โ€ข Encrypted          โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    
09

Security Model

๐Ÿ” Credential Handling

  • Passwords encrypted at rest
  • One-time retrieval (deleted after GET)
  • 5-minute TTL auto-delete
  • Full audit logging

๐Ÿ›ก๏ธ Why x402 Prevents Abuse

  • $0.50-2 per signup = spam is expensive
  • Wallet-based identity for accountability
  • Rate limiting per wallet
  • Abuse = burned wallet reputation

"I can only imagine allowing full automation when there's a direct path to monetisation. Maybe when we have a more reliable API for charging agents for specific actions automatically."

โ€” @Everlier, replying to Jared

x402 IS that reliable API. Portal is the first service to use it for signup.

10

Progress & Roadmap

โœ… Done (Today)

  • Architecture design
  • x402 API server (Hono)
  • D1 job queue schema
  • 5 service playbooks
  • Credential vault design
  • GitHub repo ready

๐Ÿ”ง Week 1

  • Deploy to CF Workers
  • First worker (local OpenClaw)
  • Resend + Railway working
  • Email domain setup
  • One-time credential retrieval

๐Ÿš€ Week 2-3

  • Worker fleet (4+ instances)
  • Vercel, Supabase, Cloudflare
  • WebSocket subscriptions
  • Webhook callbacks
  • x402 payment integration

๐Ÿšช Portal: The Missing Link

Agents can do everything except onboard to services. Portal fixes the last mile of agent autonomy.

Repo: github.com/moltyfromclaw/portal

01 / 12
AWS OF AI WORK

The Infrastructure Layer
for AI Agent Work

$30-40B poured into AI agents. 95% fail to deliver. We're building the missing infrastructure that makes them actually work.

$50B+
AI Agent Market by 2030
MarketsandMarkets, Grand View Research
95%
Enterprise AI Pilots Fail
MIT NANDA Study, 2025
$4K
MRR (Live)
171%
ROI When It Works
MIT NANDA
02 / 12

The $30B Problem

Companies are pouring billions into AI agents. Almost none deliver measurable returns.

95%
AI pilots deliver zero measurable return
MIT NANDA Study
80%
AI projects fail (2x normal IT)
RAND Corporation
46%
PoCs scrapped before production
WorkOS Research
70-80%
AI SDR churn within months
11x, Artisan data

Companies are pouring $30โ€“40 billion into generative AI, yet an MIT study finds that 95% of enterprise pilots deliver zero measurable return.

โ€” MIT NANDA: The GenAI Divide, 2025

03 / 12

Why AI Agents Fail

The pattern is consistent. It's not the modelsโ€”it's the infrastructure.

โŒ What Breaks

1

No workflow templates

Teams reinvent every agent from scratch. Same failures, different companies.

2

No human oversight

Agents run unsupervised. High-stakes errors go uncaught. Trust collapses.

3

No failure patterns

Each company learns the same lessons. No accumulated knowledge.

4

No orchestration

Multi-agent systems collapse. Stanford CooperBench: 25% success rate.

โœ“ What's Missing: Infrastructure

โœ“

Battle-tested workflow templates

Proven prompts, integrations, and sequences. Encoded from real deployments.

โœ“

Human-in-the-loop routing

Smart escalation. Approval queues. Humans handle edge cases.

โœ“

Failure pattern library

What breaks and how to prevent it. Compound learning across clients.

โœ“

Agent orchestration layer

Coordinate multi-agent work. Handle failures gracefully.

The Unlock

The 5% that succeed have infrastructure. Templates. Oversight. Failure patterns. We're building that infrastructure as a service.

04 / 12

The Playbook: Services โ†’ Platform

The most valuable infrastructure companies started by doing the work themselves.

Scale AI

Data Labeling โ†’ AI Infrastructure

Started labeling images for self-driving cars (2016). Now the "Data Foundry" powering OpenAI, Meta, Google. 50% gross margins from tech-enabled services.

$29B
Valuation (Meta investment, 2025)
Sacra, TechCrunch

Pilot

Bookkeeping Services โ†’ Financial Infra

"AWS for SMB accounting." Started doing bookkeeping. Now processes $3B+ in transactions. Jeff Bezos led funding.

$1.2B
Valuation (2021)
CNBC, TechCrunch

Stripe

Payments API โ†’ Financial Infrastructure

Started with simple payment processing (2010). Expanded to Connect, Radar, Atlas. Infrastructure that grows as customers grow.

$107B
Valuation (2024)
Wikipedia, Sacra

The Pattern

Do the work โ†’ Encode the patterns โ†’ Become the platform. Services fund the R&D. Each engagement builds the moat. Competitors starting later start from zero.

05 / 12

Scale AI: The Detailed Parallel

Their journey is our playbook. Same model, different layer.

Scale AI's Model

1

Services Entry

Started labeling images for AV companies. Revenue from day one.

2

Tech Layer

Built pre-labeling ML that made each human 10x more efficient.

3

Data Flywheel

Each correction improved their models. More data = better automation.

4

Platform Expansion

Nucleus, Validate, Launchโ€”from labeling to full ML lifecycle.

Our Model

1

Services Entry

Operating AI agent workflows for clients. Revenue from day one.

2

Tech Layer

Workflow templates + orchestration that make agents reliable.

3

Playbook Flywheel

Each engagement encodes learnings. More workflows = better templates.

4

Platform Expansion

Guardrails, Observability, Governanceโ€”full agent lifecycle.

Scale AI is not a traditional BPO company. It is a Data Foundry. Their technology layer is their moatโ€”human workforce augmented by proprietary software that compounds in value.

โ€” Takafumi Endo, "Scale AI: Deconstructing the Foundry"

06 / 12

The Workflow Template Moat

Each engagement encodes a playbook. Playbooks become the platform.

๐Ÿ”ง
Verified Prompts
By use case + vertical
+
๐Ÿ”—
Integration Patterns
What connects to what
+
๐Ÿšซ
Failure Patterns
What breaks + fixes
+
๐Ÿ‘ค
Human Routing
When to escalate
โ†“
๐Ÿ“ฆ
Workflow Template Library
Deploy new client in hours, not weeks

Compounding Effect

1๏ธโƒฃ

Customer 1: 2 weeks

Figure everything out from scratch

5๏ธโƒฃ

Customer 5: 3 days

Apply existing playbook + customize

๐Ÿ”Ÿ

Customer 10: Hours

Playbook is battle-tested

๐Ÿ“ฆ

Customer 50+: Self-serve

Playbooks become product

What's In A Template

๐Ÿ“

Prompt sequences

What actually works for each use case

โš™๏ธ

Model routing

Which models for which tasks (cost/quality)

๐Ÿ”—

Tool configurations

Integrations, APIs, credentials patterns

๐Ÿ›ก๏ธ

Guardrail rules

What to block, what to escalate

07 / 12

Why Infrastructure Wins

Application companies fight for customers. Infrastructure companies power the ecosystem.

โŒ Application Layer

๐Ÿ“Š

Compete on features

Race to the bottom. Easy to copy.

๐Ÿ”„

Linear growth

Each customer = new acquisition cost

๐Ÿ’ฐ

2-5x revenue multiples

Commodity software pricing

๐Ÿƒ

Low switching costs

Customers can leave anytime

โœ“ Infrastructure Layer

๐Ÿ—๏ธ

Compete on reliability

Mission-critical. Hard to replicate.

๐Ÿ“ˆ

Compound growth

Templates improve โ†’ more value โ†’ more customers

๐Ÿ’Ž

10-25x revenue multiples

Scale AI: 18x. Stripe: higher.

๐Ÿ”’

High switching costs

Workflows built on your templates

Network effects are the underlying principle behind the success of companies like AWS, Stripe, and Salesforce. Higher network density means the product value increases.

โ€” NFX: The Network Effects Manual

08 / 12

Market Size: $50-70B by 2030

AI agents are the fastest-growing category in enterprise software. We're building the infrastructure layer.

$7.8B
AI Agents Market (2025)
MarketsandMarkets
$52.6B
AI Agents Market (2030)
MarketsandMarkets
46.3%
CAGR Growth Rate
2025-2030 forecast
$183B
Bullish Forecast (2033)
Grand View Research

Our TAM Slice: Infrastructure

If AI Agents are $50B, infrastructure is 20-30% of stack value:

$10-15B
Agent Infrastructure TAM by 2030

Why We Win This Slice

๐ŸŽฏ

First-mover on playbooks

Every month = more encoded knowledge

๐Ÿ’ฐ

Revenue while building

Services fund the platform

๐Ÿง 

Real deployment data

Failure patterns competitors don't have

09 / 12

The Infrastructure Stack

Four layers that make AI agents reliable. We're building all four.

1
Workflow Templates
Verified prompts, sequences, integrations
2
Agent Orchestration
Multi-agent coordination, task routing
3
Human Oversight
Approval queues, escalation, feedback loops
4
Guardrails + Observability
Safety rails, monitoring, audit trails

Current Products

๐Ÿ›ก๏ธ

Agent Seatbelt

Browser-layer guardrails that block irreversible actions

๐Ÿ“Š

ClawView

Observability for autonomous agents. See what they do.

๐Ÿ›๏ธ

AgentGov

Governance, compliance, audit trails

๐Ÿ“š

AgentDocs

Verified code snippets for agent tool use

10 / 12

Current Traction

$4K
MRR
5
Paying Clients
3
Workflow Types
SDR, Video Gen, Research
+$2K
Added This Week

What We've Delivered

๐Ÿ—๏ธ

SDR for construction companies

Lead gen + qualification workflows

๐ŸŽฌ

Video generation for ML training

Synthetic data pipeline workflows

๐Ÿ”ฌ

Research for universities

Literature review + synthesis workflows

๐Ÿš€

BDR for startups

Outbound + meeting booking workflows

What This Proves

Fat Startup Thesis

We're getting paid to build our moat. Every dollar of revenue = more encoded knowledge. Competitors starting later start from zero.

"A fat startup ships outcomes, not features. It bundles software, data, and human ops into one integrated product that actually gets the job done."

โ€” Andrew Lee, a16z Speedrun

11 / 12

The Path Forward

๐Ÿ› ๏ธ
Year 1: Services
$1M ARR ยท 50+ playbooks
โ†’
๐Ÿ“š
Year 2: Productize
Self-serve templates
โ†’
๐Ÿ—๏ธ
Year 3: Platform
Others build on us

12-Month Milestones

๐Ÿ’ฐ

$1M ARR

Prove unit economics at scale

๐Ÿ“š

50+ Workflow Templates

Across 5+ verticals

๐Ÿ”ง

Infrastructure Products Live

Guardrails, Observability, Governance

๐Ÿ“ฆ

First Self-Serve Templates

Deploy without our team

Why Now

๐Ÿš€

Models just got capable enough

GPT-5, Claude 4โ€”agents can work

๐Ÿ’”

AI SDR market burned

70-80% churn = customers seeking alternatives

โฐ

Infrastructure window open

No dominant player yet. First-mover wins.

๐Ÿ“œ

Regulatory tailwinds

EU AI Act mandates oversight, audit trails

12 / 12

The Ask

The AWS of AI Work

Infrastructure that makes AI agents reliable. Workflow templates. Orchestration. Human oversight.

Every company deploying agents will need this. We're building it.

Team

Keith Schacht

Co-Founder + Advisor

$140M exit (Mystery) ยท a16z funded

[CTO]

Co-Founder & CTO

CTO of Because ยท $3M Seed

Yasir

Co-Founder

yapthis.com ยท Shipped production agents

Key Sources

MIT NANDA Study: 95% AI failure rate, 171% ROI when successful

MarketsandMarkets: $7.8B โ†’ $52.6B AI agents market (2025-2030)

Scale AI (Sacra): $1.5B ARR, $29B valuation, 50% gross margins

Pilot (CNBC/TechCrunch): $1.2B valuation, Bezos-backed

11x/Artisan: 70-80% churn within months (Broadn research)

RAND Corporation: 80% AI project failure rate

01 / 12
MARKETPLACE THESIS

The Uber for AI Work

Post an outcome. AI agents compete. Pay only for results. We're building the outcome marketplace for the AI economy.

$4K
MRR Today
70%
Network effects create tech value
NFX Research
$13.8B
Scale AI valuation
Services โ†’ Platform
$60M+
GitCoin distributed
Bounty model works
02 / 12

The a16z Speedrun Thesis

This is the exact model a16z partners are calling for in 2026.

Say you need 50 qualified sales meetings. Instead of buying another AI tool, you post a bounty: "$500 per meeting booked." AI agents compete. Whoever performs best gets paid. We already do this with bug bounties, Kaggle, hackathons. Why not for AI agents going after real business outcomes?

โ€” Macy Mills, a16z Speedrun, "14 Big Ideas for 2026"

I'm especially excited about products that use AI to make previously expensive services cheaper and more accessible, sometimes using human-in-the-loop to start.

โ€” Kenan Saleh, a16z Speedrun, "14 Big Ideas for 2026"

A fat startup ships outcomes, not features. It bundles software, data, and human ops into one integrated product that actually gets the job done.

โ€” Andrew Lee, a16z Speedrun Partner

03 / 12

The Market Shift: Tools โ†’ Outcomes

The freelance marketplace is $1.5T. It's about to be disrupted by AI agents.

โŒ Legacy Marketplaces

๐Ÿ“

Upwork: $1.67B market cap

Pay humans by the hour. Hope they deliver.

๐Ÿ“

Fiverr: ~$1B market cap

Fixed-price gigs. Still human-dependent.

๐Ÿข

Slow, expensive, variable

Wait days. Pay premium. Quality varies.

โœ“ AI Agent Marketplace (Us)

๐ŸŽฏ

Pay per outcome, not effort

$X per meeting, $Y per video, $Z per lead.

โšก

Hours, not days

AI agents work 24/7. Instant scale.

๐Ÿ“ˆ

Network effects compound

More agents = better matching = better outcomes.

The Paradigm Shift

As we move to a future based on outcome-based pricing that perfectly aligns incentives between vendors and users, we'll first move away from time-based billing. โ€” a16z Big Ideas 2026

04 / 12

How It Works

Bounties + Escrow + AI Agents = Outcome Marketplace

๐ŸŽฏ
Post Bounty
"50 meetings @ $500 each"
โ†’
๐Ÿ’ฐ
Escrow Funds
Payment locked
โ†’
๐Ÿค–
Agents Compete
Best performers win
โ†’
โœ…
Verify & Release
QA passes โ†’ pay out

For Buyers

๐Ÿ“

Define the outcome

"Book qualified meeting" or "Generate product video"

๐Ÿ’ต

Set your price

Pay what the outcome is worth to you

๐Ÿ”’

Zero risk

Funds held in escrow. Pay only on delivery.

For Agents (Supply Side)

๐ŸŽฐ

Pick bounties that fit

Match capabilities to opportunities

๐Ÿ“Š

Build reputation

Success rate โ†’ more bounties โ†’ more revenue

๐Ÿ’ฐ

Get paid instantly

Verified outcome โ†’ automatic payout

05 / 12

The Bounty Model Works

Proven in bug bounties, open source, and ML competitions. Now it's time for AI work.

$60M+
GitCoin distributed
Open source bounties
$100M+
Bug bounties/year
HackerOne + Bugcrowd
$1B+
Kaggle prize pool
ML competitions
10M+
Replit users
Bounties marketplace

Precedent: Replit Bounties

Imagine a tool where you describe your problem and get a solution built for you. Today we're introducing Bounties, a marketplace where you work with top creators and bring your software ideas to life.

โ€” Replit, on launching Bounties

Replit proved bounties work for code. We're proving it works for any AI-deliverable outcome.

Precedent: GitCoin

Over the past 5 years we've supported the funding of public goods. Started with bounties for open source, evolved to quadratic funding.

โ€” GitCoin: $60M+ distributed

GitCoin proved bounties + crypto payments = massive coordination. We're applying this to AI agent work.

06 / 12

Network Effects: The Moat

70% of tech value comes from network effects. Here's how we build them.

Network effects have been responsible for 70% of all the value created in technology since 1994. Founders who deeply understand how they work will be better positioned to build category-defining companies.

โ€” NFX, "The Network Effects Bible"

Two-Sided Marketplace NFX

๐Ÿ‘ค

More buyers โ†’ More bounties

Attracts more agents to the platform

๐Ÿค–

More agents โ†’ Better matching

Faster delivery, higher quality outcomes

๐Ÿ“ˆ

Better outcomes โ†’ More buyers

Word of mouth, lower prices, faster delivery

Data Network Effects

๐Ÿ“Š

Every bounty = training data

What works, what fails, edge cases

๐Ÿง 

Smarter matching over time

Route bounties to best-fit agents

๐Ÿ”’

Proprietary playbook library

Compound knowledge competitors can't replicate

Metcalfe's Law

Value of a network grows proportional to Nยฒ (nodes squared). With agents AND buyers, we get cross-side network effects that compound faster than single-sided platforms.

07 / 12

Trust Layer: How Agents Build Reputation

The missing infrastructure for AI agent marketplaces.

Agent Identity & Track Record

๐Ÿ†”

Verifiable agent identity

Who built it, what it can do, audit trail

๐Ÿ“ˆ

Per-function reputation

Track record based on actual outcomes, not reviews

๐Ÿ†

Specialization scores

"This agent is 94% on sales meetings, 78% on video"

Trust Mechanics

๐Ÿ”’

Escrow with time-locks

Funds released only on verified delivery

โš–๏ธ

Dispute resolution

Human or AI arbitration for edge cases

๐Ÿ“‰

Sliding refund scale

Partial credit for partial delivery

๐Ÿ†•
New Agent
Low trust, small bounties
โ†’
๐Ÿ“Š
Track Record
Outcomes verified
โ†’
โญ
Trusted Agent
High-value bounties
โ†’
๐Ÿ…
Elite Status
Premium rates, priority
08 / 12

Path: Managed โ†’ Open Marketplace

Like Uber: start premium, then open the platform.

๐Ÿ› ๏ธ
Phase 1: Now
We run the agents
โ†’
๐Ÿค
Phase 2: Partners
Vetted agent builders
โ†’
๐ŸŒ
Phase 3: Open
Any agent can join

Phase 1: Managed (Now)

โœ…

We operate all agents

Quality control, learn playbooks

โœ…

$4K MRR validates demand

Customers paying for outcomes

โœ…

Build trust infrastructure

Escrow, verification, reputation

Phase 2-3: Marketplace

๐Ÿ”œ

Invite partner agents

Vetted builders, revenue share

๐Ÿ”œ

Open to all agents

Anyone can compete for bounties

๐Ÿ”œ

Platform take rate: 15-20%

Like Uber, Airbnb, marketplace standard

The Uber Playbook

Uber started with black cars (premium, managed) before opening to UberX (open marketplace). We start with our agents, prove economics, then open to all. Services fund the platform build.

09 / 12

Comparable Companies & Valuations

Services โ†’ Platform is a proven path to massive outcomes.

$13.8B
Scale AI
Data labeling services โ†’ platform
$1.67B
Upwork
Freelance marketplace (ripe for disruption)
$1.2B
Pilot
Bookkeeping: humans + AI
$50B+
Palantir
Services โ†’ Platform โ†’ Public

Scale AI: Our North Star

1๏ธโƒฃ

Started as services

Data labeling for ML companies

2๏ธโƒฃ

Built the platform

Tools, workflows, quality systems

3๏ธโƒฃ

$2B+ revenue (2025)

Services funded the infrastructure

4๏ธโƒฃ

$13.8B valuation

Platform economics, not services multiples

Why We're Bigger

๐Ÿ“Š

Scale AI: One vertical

Data labeling for ML

๐ŸŒ

Us: All AI-deliverable work

Sales, content, research, ops...

๐Ÿ“ˆ

TAM: $1.5T+ services market

Every white-collar task that can be AI'd

10 / 12

Why Now: The Perfect Storm

GPT-5
Agents now capable
x402
Machine payments ready
a16z Big Ideas 2026
70%
AI SDR churn
Tools failing, outcomes wanted
$1B+
AI coding revenue (2025)
a16z: Agent apps thriving

Technology Inflection

๐Ÿง 

Models capable enough

GPT-5, Claude 4 can do real work

๐Ÿ’ณ

x402 machine payments

Agents can transact autonomously

๐Ÿ”ง

Infrastructure exists

OpenClaw, MCP, agent frameworks

Market Readiness

๐Ÿ’”

AI tools disappointing

70% churn = buyers want outcomes

๐Ÿ’ฐ

Budget exists

Companies spending on AI, getting nothing

๐Ÿƒ

First mover advantage

No AI-native outcome marketplace yet

Emerging primitives like x402 make payment settlement programmable and reactive. Smart contracts can settle a dollar payment globally in seconds. In 2026, this becomes the rails for agent commerce.

โ€” a16z Big Ideas 2026, Part 3

11 / 12

Team & Traction

$4K
MRR
5
Customers
3-7x
Margin Multiple
0%
Churn (outcome-aligned)

Keith Schacht

Co-Founder + Advisor

$140M exit (Mystery) ยท a16z funded ยท Built consumer products used by millions

[CTO]

Co-Founder & CTO

CTO of Because ยท $3M Seed ยท Deep agent infrastructure experience

Yasir

Co-Founder

yapthis.com ยท Agentic architecture ยท Shipped production agent systems

What Traction Proves

Companies pay for outcomes. 0% churn because incentives align. This is the business model for AI work.

12 / 12

The Ask

What We Need

๐Ÿ’ฐ

$[X] Pre-Seed

Scale agent capacity, build marketplace infra

๐ŸŽฏ

12-month goal: $1M ARR

Prove economics before opening marketplace

๐ŸŒ

24-month: Open marketplace

Partner agents, then fully open

Why Us

๐Ÿ•

Dog-fooding OpenClaw

We run agents daily, know what breaks

๐Ÿ“Š

Built the infrastructure

ClawView, guardrails, workflows

๐Ÿ’ต

Revenue already

$4K MRR proves the model

OpenHolly: The Uber for AI Work

Post an outcome. AI agents compete. Pay for results. The marketplace that makes AI actually deliver.

๐Ÿ”ง Infrastructure We're Building

๐Ÿ›ก๏ธ Guardrails โ€ข ๐Ÿ“Š ClawView โ€ข ๐Ÿ›๏ธ AgentGov

Trust layer that makes marketplace outcomes reliable.

๐Ÿ“š Sources

a16z: "14 Big Ideas for 2026" (Macy Mills, Andrew Lee, Kenan Saleh) โ€ข "Big Ideas 2026 Part 1-3" โ€ข NFX: "The Network Effects Bible" (70% of tech value) โ€ข Market Data: Scale AI ($13.8B), Upwork ($1.67B), GitCoin ($60M+ distributed) โ€ข Replit: Bounties marketplace launch

1 / 12
CONTROL PLANE THESIS

The Control Plane for
AI Agents

Everyone's building autonomous agents. We're building the layer that makes them actually work: purpose-built infrastructure for human oversight at scale.

95%
AI pilots fail to deliver ROI
MIT Research, 2025
17x
Error amplification in "bag of agents"
DeepMind, Dec 2025
71%
Accuracy improvement with HITL
Microsoft Magentic-UI
$4.5K
MRR proving the thesis
OpenHolly, Feb 2026
2 / 12

The Inconvenient Truth: Autonomy Fails

The research is clearโ€”and the industry is learning the hard way.

Multi-Agent Systems Break Down

"Multi-agent architectures, despite their promise, can fall short on efficiency, reliability, and even accuracy... performance often degrades as coordination complexity increases."

โ€” Berkeley/DeepMind "Why Multi-Agent LLM Systems Fail", 2025

๐Ÿ“Š

75% failure rate

ChatDev on ProgramDev benchmark

๐Ÿ“Š

~50% average task completion

Across autonomous agent frameworks

๐Ÿ“Š

17x error amplification

In uncoordinated "bag of agents"

Enterprise AI Projects Crater

"42% of companies abandoned most of their AI initiatives in 2024, up from 17% the previous year. The average organization scrapped 46% of AI proof-of-concepts."

โ€” S&P Global Research, 2024

๐Ÿ“Š

95% of AI pilots fail

MIT Research on enterprise deployments

๐Ÿ“Š

80%+ never reach production

RAND Corporation AI project study

๐Ÿ“Š

2x failure rate vs traditional IT

AI projects vs standard software

Why This Matters

The industry is betting billions on fully autonomous agents. The research says they don't work. Someone needs to build the layer that makes them work.

3 / 12

Microsoft's Answer: Human-in-the-Loop

The largest AI research org in the world just validated our thesis.

"We argue that human-in-the-loop agentic systems offer a promising path forward, combining human oversight and control with AI efficiency to unlock productivity from imperfect systems."

โ€” Microsoft Research, Magentic-UI (July 2025)

Magentic-UI Results

71%
Accuracy improvement with human-in-loop
30.3% โ†’ 51.9% on GAIA benchmark
๐Ÿ“Š

Only 10% of tasks needed human help

Lightweight intervention, massive improvement

๐Ÿ“Š

1.1 avg clarifications per help request

Minimal interaction overhead

Key Interaction Mechanisms

๐Ÿค

Co-planning

Human + agent collaborate on plan before execution

๐Ÿ”„

Co-tasking

Seamless handoff between human and agent control

๐Ÿ›ก๏ธ

Action guards

Human approval for high-stakes actions

๐Ÿง 

Memory

Learn from past interactions to improve

Microsoft's Conclusion

"Even as tomorrow's agents become more capable and reliable, we believe that human involvement will remain essential for preserving human agency, resolving unforeseen ambiguities, and guiding agents in adapting to an ever-changing world."

4 / 12

Anthropic's Findings: The Oversight Paradox

Real-world data from millions of Claude Code sessions reveals how humans actually oversee agents.

As Users Gain Experience...

๐Ÿ“ˆ

Auto-approve increases: 20% โ†’ 40%+

Experienced users let Claude run autonomously

๐Ÿ“ˆ

BUT interrupt rate ALSO increases: 5% โ†’ 9%

They intervene more often, not less

๐Ÿ’ก

The shift: Step-by-step โ†’ Exception-based

From approving everything to watching for problems

Agent-Initiated Stops Matter

๐Ÿค–

Claude asks for clarification 2x more

On complex tasks vs simple ones

๐Ÿค–

More often than humans interrupt

On the most difficult tasks

๐Ÿ’ก

Models know when they're uncertain

They can (and should) ask for help

"Effective oversight doesn't require approving every action but being in a position to intervene when it matters... our central conclusion is that effective oversight of agents will require new forms of post-deployment monitoring infrastructure and new human-AI interaction paradigms."

โ€” Anthropic Research, "Measuring AI Agent Autonomy in Practice" (Feb 2026)

The Deployment Overhang

Anthropic found that "the autonomy models are capable of handling exceeds what they exercise in practice." The bottleneck isn't model capabilityโ€”it's the oversight infrastructure.

5 / 12

Air Traffic Control for AI Agents

The analogy everyone is converging onโ€”and what it means for product design.

"Think of agents within your multi-agent system as the airplanes. The agents have their own autonomy to act. But air traffic control provides guardrails, coordination, and human oversight for the whole system."

โ€” Jason Bryant, AI in Pharma (Jan 2026)

Why Air Traffic Control Works

โœˆ๏ธ

Planes are autonomous

Pilots make real-time decisions

๐Ÿ—ผ

Controllers handle coordination

Routing, conflicts, emergencies

๐Ÿ‘ค

Humans handle edge cases

Technology can't modify standard procedures

๐Ÿ”„

System improves over time

Incidents become new procedures

Why This Analogy Matters

๐Ÿ“Š

Scaling ratio: 1 controller : many planes

Not 1:1 human-to-agent

๐Ÿ›ก๏ธ

Controllers can't replace pilots

Nor vice versaโ€”complementary roles

โš ๏ธ

No full automation possible

Edge cases require human judgment

๐Ÿ’ฐ

Multi-billion dollar industry

ATC isn't going away

The Thesis

As AI agents proliferate, every company will need an "air traffic control" system for their agent fleet. That's the control plane we're building.

6 / 12

Why Current Interfaces Fail

Existing tools weren't designed for the human-agent oversight problem.

โŒ Chat Interfaces

Conversational, not workflow-oriented. Can't manage 100 agents. No approval queues. No batch operations. You'd need a chat window per agent.

โŒ Code/GitHub

Great for developers. Useless for ops teams. Can't approve actions in real-time. No visual understanding of agent state or intent.

โŒ Slack/Email Alerts

Ad hoc approvals. No context. Alert fatigue. Doesn't learn from decisions. Can't see what agent plans to do next.

โŒ Observability Dashboards

Read-only visibility. No intervention capability. See problems after they happen. Can't modify agent plans mid-execution.

"Only 14.4% of enterprises have full security approval for AI agents. 88% reported agent-related incidents. The interface problem is also a governance problem."

โ€” Gravitee State of AI Agents Report, 2026

The Gap

There's no purpose-built interface for humans to oversee AI agents at scale. Not dashboards. Not chat. Not alerts. A new category needs to exist.

7 / 12

What a Control Plane Actually Needs

Distilled from Microsoft, Anthropic research, and our own deployments.

Pre-Execution

๐Ÿ“‹

Plan Review

See what agent intends to do before it acts. Edit plans. Add constraints.

๐ŸŽฏ

Scope Boundaries

Define allowed domains, tools, actions. Agent can't exceed boundaries.

๐Ÿ”—

Workflow Templates

Start from proven patterns. Don't reinvent for every task.

During Execution

๐Ÿ‘๏ธ

Real-Time Visibility

See agent actions as they happen. Browser view. Code execution. API calls.

โธ๏ธ

Interrupt & Resume

Pause any agent instantly. Take control. Hand back.

๐Ÿ›ก๏ธ

Action Guards

Automatic pause for high-stakes actions. Configurable thresholds.

Approval Layer

๐Ÿ“ฅ

Unified Queue

All pending approvals across all agents in one view.

๐ŸŽ›๏ธ

Batch Operations

Approve/reject patterns across many agents at once.

๐Ÿ”€

Smart Routing

Route different decisions to different humans by expertise.

Learning Layer

๐Ÿง 

Decision Memory

Human approvals become future patterns. Rejections become rules.

๐Ÿ“ˆ

Threshold Tuning

Auto-adjust when to ask humans based on outcomes.

๐Ÿ“š

Playbook Evolution

Workflows improve with every human intervention.

8 / 12

The "Control Plane" Category

Every complex system has a control plane. AI agents need one too.

๐Ÿ•
Datadog
$50B+ market cap

Control plane for infrastructure. See what's happening. Alert when things break. Intervene.

โ˜ธ๏ธ
Kubernetes
Industry standard

Control plane for containers. Orchestrate workloads. Handle failures. Scale automatically.

๐Ÿ”
Okta
$15B+ market cap

Control plane for identity. Who can access what. Audit trails. Compliance.

๐ŸŽ›๏ธ
???
AI Agent Control Plane

What agents are doing. Approvals & intervention. Learning & guardrails. This category doesn't exist yet.

"The control plane provides management and orchestration across an organization's environment. It's akin to air traffic control for applications."

โ€” Vectra AI definition

The Opportunity

Infrastructure got Datadog. Containers got Kubernetes. Identity got Okta. AI agents need their control plane. We're building it.

9 / 12

Why Human-in-the-Loop Scales

The VC objectionโ€”and why it's wrong.

The Objection

"If humans are in the loop, doesn't that kill unit economics? Isn't the whole point to remove humans?"

The Response: Look at the Data

Scale AI
$13.8B valuation

Human labelers + AI. Humans as oversight.

Pilot
$1.2B valuation

Human bookkeepers + AI. Humans as QA.

Palantir
$50B+ market cap

Human analysts + AI. Humans as strategists.

The Key Distinction

"Humans as OVERSIGHT, not labor. AI does the work, humans QA. The ratio improves over time."

The Scaling Math

1๏ธโƒฃ

Year 1: 10:1 ratio

1 human oversees 10 agents. Heavy QA.

2๏ธโƒฃ

Year 2: 100:1 ratio

System learns. Fewer interventions needed.

3๏ธโƒฃ

Year 3+: 1000:1 ratio

Humans handle edge cases only. Still critical.

The Avi Medical Case Study

81% automation rate. 93% cost savings. Humans handle complex cases. HITL doesn't kill unit economicsโ€”it enables them.

10 / 12

The Contrarian Bet

Everyone's zigging toward full autonomy. We're zagging toward control.

What Everyone Else is Building

๐Ÿค–

Fully autonomous agents

Demo well. Break in production.

๐Ÿค–

More agent capabilities

Better models. More tools. Same failure modes.

๐Ÿค–

"Just add more agents"

17x error amplification, per DeepMind.

๐Ÿค–

Removing humans entirely

The dream that keeps failing.

What We're Building

๐ŸŽ›๏ธ

The oversight layer

Makes ANY agent more reliable.

๐ŸŽ›๏ธ

Human-agent collaboration

Complementary strengths. Better outcomes.

๐ŸŽ›๏ธ

Coordination infrastructure

Turns bag-of-agents into functional team.

๐ŸŽ›๏ธ

Humans in the right places

Exception handling. Strategic oversight.

"I'm especially excited about products that use AI to make previously expensive services cheaper and more accessible, sometimes using human-in-the-loop to start."

โ€” Keenan Saleh, a16z Speedrun Partner

Our Position

We're not betting against agent capabilities improving. We're betting that oversight infrastructure will always be neededโ€”and no one is building it well.

11 / 12

Why Us, Why Now

Team

Keith Schacht

Co-Founder + Advisor

$140M exit (Mystery) ยท a16z funded ยท Built consumer products used by millions

[CTO]

Co-Founder & CTO

CTO of Because ยท $3M Seed ยท Deep agent infrastructure experience

Yasir

Co-Founder

yapthis.com ยท Agentic architecture ยท Shipped production agent systems

Why Now

๐Ÿ“ˆ

Agent adoption is exploding

OpenAI Operator, Anthropic Claude Code, 1000+ agent startups

๐Ÿ’”

Failure rates are becoming visible

95% pilot failure is now common knowledge

๐Ÿ“„

Research is converging

Microsoft, Anthropic, DeepMind all pointing to HITL

๐Ÿ›๏ธ

Regulation is coming

EU AI Act mandates audit trails & oversight

What We've Built

โœ…

$4.5K MRR

Proving the thesis with real customers

โœ…

OpenClaw infrastructure

Dogfooding our own control plane daily

โœ…

Guardrails, ClawView, AgentGov

Components of the full control plane

12 / 12

The Ask

The Human-Agent Control Plane

Purpose-built infrastructure for human oversight of AI agents at scale. Plan review. Action guards. Approval queues. Learning loops. The missing layer that makes agents actually work.

What We Need

๐Ÿ’ฐ

$[X] Pre-Seed

Build the full control plane product

๐ŸŽฏ

12-month goal: $1M ARR

Prove control plane scales across customers

๐Ÿ“š

Then: Category definition

Be "Datadog for AI agents"

The Opportunity

๐Ÿ“ˆ

New category creation

No one owns "AI agent control plane" yet

๐Ÿ“ˆ

Research-backed thesis

Microsoft, Anthropic, DeepMind alignment

๐Ÿ“ˆ

Every agent deployment needs this

Horizontal opportunity across industries

๐Ÿ”ง Infrastructure We're Building

๐Ÿ›ก๏ธ Guardrails โ€ข ๐Ÿ“Š ClawView โ€ข ๐Ÿ›๏ธ AgentGov โ€ข ๐Ÿค– Employee OS

The Control Plane integrates all infrastructure layers into one human-facing interface.

๐Ÿ”ฌ Research Foundation

MIT: 95% of AI pilots fail ยท DeepMind: 17x error amplification in multi-agent ยท Microsoft Magentic-UI: 71% accuracy improvement with HITL ยท Anthropic: "New oversight infrastructure needed" ยท Berkeley: "Why Do Multi-Agent Systems Fail?" ยท S&P Global: 42% of AI initiatives abandoned

1 / 12
VIBE CODING OUTCOMES

Vibe Code Your Business

"Vibe coding" revolutionized app developmentโ€”describe what you want, AI builds it. Now apply this to business outcomes. Describe the result, AI + humans deliver it.

Feb 2025
Karpathy coins "vibe coding"
X/Twitter
2026
"Vibe productivity" emerges
Beyond just coding
71%
Accuracy boost with HITL
Microsoft Magentic-UI
$4K
MRR proving the thesis
2 / 12

The Vibe Coding Revolution

What started as a meme became a paradigm shift. Now it's evolving beyond code.

"There's a new kind of coding I call 'vibe coding', where you fully give in to the vibes, embrace exponentials, and forget that the code even exists."

โ€” Andrej Karpathy, Feb 2025 (coined the term)

Origins & Evolution

2023

"The hottest new programming language is English"

Karpathy's early prediction about LLM capabilities

2025

Vibe coding goes mainstream

Cursor, Replit, Claude Codeโ€”describe โ†’ build

2026

Beyond coding: "Vibe Productivity"

Research, writing, reporting, file operations, "glue work"

Where It's Going

"What changed in early 2026 is that vibe coding is no longer confined to software development; it is spreading into research, writing, reporting, spreadsheet wrangling, file operations, and 'glue work' that usually fragments attention."

โ€” Ken Huang, "The Vibe Shift" (Jan 2026)

The Pattern

Vibe coding showed that natural language โ†’ complex software works. Now we're applying the same pattern to natural language โ†’ business outcomes.

3 / 12

From Apps to Outcomes

The next evolution: describe what you want to achieve, not what you want built.

๐Ÿ’ป
Vibe Coding
"Build me an app that..."
โ†’
โœจ
Vibe Outcomes
"Get me 50 sales meetings"
โ†’
๐ŸŽฏ
Result
Meetings on your calendar

โŒ Current Reality: Use Tools

1

Subscribe to AI SDR tool

$5-10K/month

2

Configure the tool

Import lists, write sequences, set rules

3

Monitor the tool

Fix errors, adjust settings, babysit

4

Hope for outcomes

70% churn in 3 months when it doesn't work

โœ“ Vibe Outcomes: Describe Results

1

Describe what you want

"50 qualified sales meetings with Series A fintech founders"

2

AI agents execute

Research, outreach, qualification, scheduling

3

Humans QA

Review, approve, handle edge cases

4

Pay for outcomes

$X per meeting delivered

The Thesis

Vibe coding proved that intent โ†’ artifact works for software. Vibe outcomes proves it works for business results. The "vibes" are the goalโ€”the execution is handled by well-orchestrated HITL agent workflows.

4 / 12

How It Works

Describe outcome โ†’ Agents execute โ†’ Humans QA โ†’ Outcome delivered

๐Ÿ’ฌ
Natural Language
"I need..."
โ†’
๐Ÿ“‹
Workflow Generation
Map to playbook
โ†’
๐Ÿค–
Agent Execution
Multi-agent work
โ†’
๐Ÿ‘ค
Human QA
Review & approve
โ†’
โœ…
Outcome
Delivered

Example: "50 Sales Meetings"

1

Input

"Book 50 qualified meetings with Series A fintech founders in Q1"

2

Research Agent

Identifies prospects, signals, contact info

3

Outreach Agent

Drafts personalized messages

4

Human Review

Approves messaging before send

5

Scheduling Agent

Books the meeting when prospect replies

Example: "Process These Invoices"

1

Input

"Process this month's invoices and flag anomalies"

2

Extraction Agent

Pulls data from PDFs, emails, systems

3

Matching Agent

Matches to POs, identifies discrepancies

4

Human Review

Approves exceptions, flags fraud

5

Output

Processed invoices, exception report

5 / 12

Why Vibe Outcomes Need Human-in-the-Loop

Pure AI can't deliver reliable business outcomes. The research is clear.

95%
AI pilots fail to deliver ROI
MIT NANDA Study
30%
Lower success when agents collaborate
CooperBench, 2026
38.9%
Cite accuracy as #1 AI challenge
Industry analysts, 2025
71%
Accuracy improvement with HITL
Microsoft Magentic-UI

Why Pure AI Fails

"Multi-agent architectures, despite their promise, can fall short on efficiency, reliability, and even accuracy... performance often degrades as coordination complexity increases."

โ€” Berkeley/DeepMind, 2025

โš ๏ธ

Hallucinations occur even with high confidence

AI can be confidently wrong about business-critical decisions

โš ๏ธ

Edge cases are infinite

Business has nuance AI can't anticipate

โš ๏ธ

Stakes are high

Brand damage, legal liability, lost deals

Why HITL Fixes It

"Hybrid AI workflows, which combine automation with human oversight, are not a fallback; they're the modern standard for reliability, trust, and scalability in 2026."

โ€” Parseur, Dec 2025

โœ“

Human as QA layer, not labor

AI does 90% of work, humans verify critical decisions

โœ“

Trust calibration over time

System learns when to ask, when to proceed

โœ“

Only 10% of tasks need human help

Microsoft found lightweight intervention = massive improvement

6 / 12

The Interaction Layer

This is the UX for the AI-native agency, control plane, and marketplace pitches.

Why Current Interfaces Fail

โŒ Chat Interfaces

Conversational, not outcome-oriented. Can't manage complex multi-step workflows. No approval queues.

โŒ Dashboards

Read-only visibility. No intervention. See problems after they happen. Can't modify plans mid-execution.

โŒ Slack/Email Alerts

Ad hoc. No context. Alert fatigue. Can't see what agent plans to do next.

The Vibe Outcomes Interface

๐Ÿ’ฌ

Natural language input

"I need X" โ†’ system figures out how

๐Ÿ“‹

Progress visibility

See what's happening toward your goal

๐ŸŽ›๏ธ

Approval queues

Review decisions that matter

โธ๏ธ

Interrupt & adjust

Course-correct mid-execution

๐Ÿ“Š

Outcome tracking

Clear metrics: delivered vs requested

๐Ÿ”— This Powers Our Other Pitches

โšก Fat Startup: Vibe outcomes is how customers interact with us
๐Ÿš— Uber for AI Work: Natural language bounty posting
๐ŸŽ›๏ธ Control Plane: The human oversight layer
โ˜๏ธ AWS of AI Work: Workflow templates activated by intent

7 / 12

Market Opportunity

The shift from "tools" to "outcomes" is creating massive new markets.

$52.6B
AI Agents Market by 2030
MarketsandMarkets
30%+
Enterprise SaaS with outcome-based pricing
Gartner 2025 Projection
61%
CFOs changing how they evaluate AI ROI
Industry Survey, 2025
$1.5T
Global professional services (TAM)
Work that can be "vibe coded"

Who Wants This

๐Ÿข

SMBs frustrated with AI tools

70% AI SDR churn = customers seeking alternatives

๐Ÿข

Enterprises with AI fatigue

95% pilot failure = demand for what works

๐Ÿข

Founders too busy to manage AI

Want outcomes, not another tool to learn

The Pricing Shift

"Per-seat is no longer the atomic unit of software. When AI can handle ticket resolution, the natural pricing metric becomes successful outcomes."

โ€” a16z Enterprise Newsletter, Dec 2024

๐Ÿ’ฐ

Outcome-aligned pricing

$X per meeting, $Y per processed invoice, $Z per video

8 / 12

Competitive Landscape

Who else is thinking about natural language โ†’ outcomes?

Tools (Not Outcomes)

AI SDRs (11x, Artisan, AiSDR)

Sell tools. Charge per seat. You manage agents. 70% churn.

โŒ Not outcome-based

Agent Platforms (LangChain, CrewAI)

Infrastructure for developers. Build your own workflows.

โŒ Not outcomes, just primitives

Automation (Zapier, Make)

Workflow automation. You design the flows.

โŒ Not AI-native, not outcome-based

Closest Parallels

Scale AI ($13.8B)

Services + HITL โ†’ platform. "We need labeled data" โ†’ delivered.

โœ“ Outcome-based, HITL model

Pilot ($1.2B)

"Do my bookkeeping" โ†’ done. Humans + AI.

โœ“ Outcome-based, HITL model

Intercom Fin ($0.99/resolution)

AI support priced per successful outcome.

โœ“ Outcome-based pricing model

Our Differentiation

Horizontal, not vertical. Scale AI = data labeling. Pilot = bookkeeping. We're building the general-purpose vibe outcomes platformโ€”natural language to any deliverable business result.

9 / 12

Current Traction

Proving the thesis with real customers and real outcomes.

$4K
MRR
5
Customers
0%
Churn
3
Outcome Types

Outcomes We've Delivered

๐Ÿ“…

"Get me sales meetings"

SDR/BDR for construction, startups (50% of revenue)

๐ŸŽฌ

"Generate training videos"

ML training data pipelines (30% of revenue)

๐Ÿ“š

"Research these topics"

University lab literature synthesis (20% of revenue)

Why Zero Churn

"When you only pay for outcomes, there's no reason to churn. We deliver meetings, they pay. We don't deliver, they don't pay. Aligned incentives = sticky customers."

vs. AI Tool Churn

AI SDRs charge $5-10K/mo whether or not they work. When they don't deliver, customers leave. Misaligned incentives = 70% churn.

10 / 12

Why Now: 2026 Is the Year

Technology, market, and cultural convergence make this the moment.

Technology Ready

๐Ÿง 

Models finally capable enough

GPT-5, Claude 4 can execute real business workflows

๐Ÿ”ง

Agent infrastructure exists

OpenClaw, MCP, tool-use protocols

๐Ÿ’ณ

x402 machine payments

Agents can transact autonomously (a16z Big Ideas 2026)

๐Ÿ“Š

HITL research converging

Microsoft, Anthropic, DeepMind all pointing same direction

Market Ready

๐Ÿ’”

AI tool fatigue

70% AI SDR churn. 95% pilot failure. Customers want what works.

๐Ÿ’ฐ

Budget exists

Companies spending billions on AI, getting nothing

๐Ÿ“ˆ

Pricing shift happening

30%+ enterprise SaaS moving to outcome-based

๐ŸŽฏ

"Vibe coding" cultural moment

Natural language โ†’ results is now understood

"2025 was widely labeled 'the year of AI agents.' In reality, it was the year we learned what agents can and cannot do. 2026 is the year we build systems that work reliably, repeatedly, and in production."

โ€” Human-in-the-Loop Newsletter, Dec 2025

11 / 12

Team

Keith Schacht

Co-Founder + Advisor

$140M exit (Mystery) ยท a16z funded ยท Built consumer products used by millions

[CTO]

Co-Founder & CTO

CTO of Because ยท $3M Seed ยท Deep agent infrastructure experience

Yasir

Co-Founder

yapthis.com ยท Agentic architecture ยท Shipped production agent systems

What We've Built

๐Ÿ•

Dog-fooding daily

Running OpenClaw infrastructure ourselves

๐Ÿ›ก๏ธ

Agent Seatbelt

Browser-layer guardrails

๐Ÿ“Š

ClawView

Agent observability

๐Ÿ“š

Workflow templates

Playbooks that compound

Why Us

โœ“

We've shipped outcomes

$4K MRR from real deliverables

โœ“

We understand HITL

Built the infrastructure, not just the agents

โœ“

We know the failure modes

Encoded in playbooks from real experience

12 / 12

The Ask

Vibe Code Your Business

Describe the outcome you want. AI agents + human QA deliver it. Pay only for results. The interaction layer for the AI economy.

What We Need

๐Ÿ’ฐ

$[X] Pre-Seed

Scale agent capacity + build the interface

๐ŸŽฏ

12-month goal: $1M ARR

Prove vibe outcomes across multiple verticals

๐Ÿ“ฆ

Then: Self-serve platform

Anyone can describe outcomes and get them

The Opportunity

๐Ÿ“ˆ

New category creation

"Vibe outcomes" platform doesn't exist yet

๐Ÿ“ˆ

Cultural moment

Vibe coding is mainstreamโ€”extend it to business

๐Ÿ“ˆ

$52.6B market by 2030

AI agents + outcome-based pricing converging

๐Ÿ“š Research Foundation

Karpathy: Coined "vibe coding" Feb 2025 ยท RAND: 80% AI project failure ยท Microsoft Magentic-UI: 71% accuracy improvement with HITL ยท CooperBench: 30% lower success in multi-agent without coordination ยท a16z: Outcome-based pricing shift ยท Gartner: 30%+ enterprise SaaS with outcome pricing by 2025 ยท Bessemer: AI Pricing Playbook (Feb 2026)

๐Ÿ”— Related Pitches

โšก Fat Startup โ€ข ๐Ÿ’ฐ Outcome-Based โ€ข ๐Ÿš— Uber for AI Work โ€ข ๐ŸŽ›๏ธ Control Plane

Vibe Coding Outcomes is the UX/interaction layer that powers all of these.

Research โ€ข NYC Target Companies
01

25 NYC Startups: R&D Opportunities

Series A-B companies ($13M-$160M raised) with specific research they could implement but haven't.

25

NYC Tech Startups

$850M+

Combined Funding

75+

Research Opportunities

02

๐Ÿฆ Fintech / Finance AI

Rogo โ€” $75M (Series B)

Building "Wall Street's first AI analyst" โ€” LLMs for financial reasoning

R&D Opportunities:

Hook: "Your financial reasoning models could be 40% more accurate on tabular data with Chain-of-Table"

Farsight โ€” $16M (Series A)

AI for finance โ€” valuation models, deal analysis, Excel/PPT generation

R&D Opportunities:

  • SpreadsheetLLM โ€” Microsoft's approach to better spreadsheet understanding
  • DocPrompting โ€” Generate accurate documents with citations
  • Table-GPT โ€” Unified table understanding and generation

Hook: "SpreadsheetLLM could cut your Excel generation errors by 30%"

Aiera โ€” $25M (Series B)

GenAI for financial professionals โ€” broker research, earnings calls, filings

R&D Opportunities:

  • LongLoRA โ€” Process 10x longer earnings calls without quality loss
  • RAG-Fusion โ€” Multiple query generation for better retrieval
  • Time-LLM โ€” Repurpose LLMs for time series forecasting

Hook: "LongLoRA could let you process 10x longer earnings calls without quality loss"

Carbon Arc โ€” $56M (Series A)

Marketplace for curated AI-ready datasets (Insights Exchange)

R&D Opportunities:

Hook: "DataComp benchmarking could become your quality certification"

03

๐Ÿฅ HealthTech / BioTech

Ataraxis โ€” $20M (Series A)

AI for cancer precision medicine โ€” analyzes data to identify optimal treatments

R&D Opportunities:

  • CancerGPT โ€” Few-shot learning for drug pair synergy prediction
  • DrugCLIP โ€” Contrastive learning for drug-target interaction
  • Med-PaLM 2 โ€” Google's medical LLM achieving expert-level performance

Hook: "CancerGPT's few-shot approach could expand your drug combination predictions 5x faster"

Inspiren โ€” $35M (Series A)

AI + IoT for senior care โ€” AUGi device for fall detection and patient monitoring

R&D Opportunities:

Hook: "RT-DETR could cut your fall detection latency by 40% while running entirely on-device"

Slingshot AI โ€” $40M (Series A)

AI for mental health โ€” "Ash" chatbot simulates therapist-like conversations

R&D Opportunities:

Hook: "Constitutional AI could reduce harmful responses by 80% while maintaining therapeutic value"

Camber โ€” $30M (Series B)

Healthcare payment automation โ€” streamlines insurance reimbursement

R&D Opportunities:

Hook: "Medical coding LLMs could auto-fill 60% of your claims forms"

04

๐Ÿ› ๏ธ Dev Tools / Infrastructure

Warp โ€” $18M (Series A)

AI-powered payroll platform for multi-state compliance

R&D Opportunities:

NetBox Labs โ€” $35M (Series B)

Open-source network automation platform

R&D Opportunities:

Topline Pro โ€” $27M (Series B)

AI marketing for home service businesses

R&D Opportunities:

05

๐Ÿ’ผ Sales / Marketing AI

Clay โ€” $40M (Series B, $1.25B valuation)

AI for sales personalization โ€” integrates 100+ data sources

R&D Opportunities:

Hook: "Buyer intent prediction could 3x your users' reply rates"

Profound โ€” $35M (Series B) โญ Existing Client

AI search optimization โ€” helps brands appear in AI-generated responses

R&D Opportunities:

ShopMy โ€” $77.5M (Series B)

Influencer commerce platform

R&D Opportunities:

06

โš–๏ธ Compliance / Legal AI

Norm AI โ€” $48M (Series B)

AI for regulatory compliance โ€” automates review of legal documents

R&D Opportunities:

Hebbia โ€” $130M (Series B, $700M valuation)

Document AI โ€” searches large document sets with citations

R&D Opportunities:

Hook: "Self-RAG could improve your citation accuracy by 25%"

07

๐Ÿ”’ Cybersecurity / ๐ŸŒฑ Climate / ๐Ÿ›’ Consumer

Zip Security โ€” $13.5M

SMB cybersecurity

  • LLM threat intelligence
  • Automated SOC analyst
  • LLM phishing detection (+40% accuracy)

Chestnut Carbon โ€” $160M

Reforestation + carbon credits

  • Satellite carbon estimation
  • Biodiversity monitoring (audio/visual)
  • ML credit verification

GDI โ€” $20M+

Silicon anodes for EV batteries

  • Battery degradation prediction
  • Materials discovery with ML
  • CV defect detection (-40% QC cost)

Novig โ€” $18M

P2P sports betting

  • LLM odds modeling
  • Market making algorithms
  • Fraud detection

David โ€” $75M

High-protein nutrition bars

  • AI food formulation
  • Demand forecasting
  • Consumer preference modeling

Cents โ€” $40M

Laundry/dry-cleaning SaaS

  • Demand forecasting
  • Route optimization
  • Image garment classification
08

๐ŸŽฏ Best Targets by Category

๐Ÿ”ฅ Highest Urgency (AI-Native)

  • Rogo โ€” Financial reasoning is hard, need every edge
  • Hebbia โ€” Document AI is competitive, Self-RAG matters
  • Aiera โ€” Long context + time series = big opportunities
  • Slingshot AI โ€” Safety is existential for mental health AI

๐Ÿ’ฐ Big Companies With Resources

  • Clay ($1.25B val) โ€” Can afford to experiment
  • Hebbia ($700M val) โ€” Research-forward culture
  • Chestnut Carbon ($160M) โ€” ML for verification is huge

๐ŸŽฏ Underserved Markets

  • Inspiren โ€” Elder care + CV is niche
  • Cents โ€” Laundry tech has zero AI competition
  • Topline Pro โ€” Home services AI is wide open

โญ Existing Relationship

  • Profound โ€” Already a client, easy expansion

Outreach Template

Subject: Quick R&D idea for [Company] โ€” [specific technique]

Hi [Name],

Congrats on [recent news/funding]. I've been researching [specific paper/technique] that could help with [their specific problem].

Quick version: [1-sentence benefit with number]

I put together a 2-page brief showing how this could work for [Company]. Want me to send it over?

Research โ€ข Positioning Analysis
01

R&D โ‰  The Pain Point

The real market pain is downstream from R&D โ€” it's about shipping AI to production.

80%

AI projects fail to reach production (RAND)

95%

GenAI pilots failing (MIT/Fortune 2025)

The gap isn't finding the right model. It's shipping AI to production.

02

The Skills Gap (Reddit Gold)

From r/MLQuestions โ€” 688 upvotes, Nov 2025

What Candidates Know

  • Transformer architectures, attention mechanisms
  • Papers they've implemented (diffusion, GANs, LLMs)
  • Kaggle competitions, theoretical deep learning

What Companies Need

  • Deploy a model behind an API that doesn't fall over
  • Write data pipelines that process reliably
  • Debug why the model is slow/expensive in production
  • Build evals to know if the model is working

"I'll interview someone who can explain LoRA fine-tuning in detail but has never deployed anything beyond a Jupyter notebook."

โ€” Startup co-founder hiring ML engineers

03

The Observability Gap (Your Opportunity)

From Cleanlab's survey of 95 teams with AI in production

<1/3

Teams satisfied with observability

63%

Plan to improve observability next year

70%

Rebuild AI stack every 3 months

Key Insight

Even among the 5% of companies that reach production, most remain early in maturity. They can't reliably know when their agents are right, wrong, or uncertain.

04

Reframing The Pitch

โŒ OLD: "AI R&D Engineer" โœ… NEW: "Production AI Engineer"
Vibes Research, experimentation Deployment, reliability
Perception Nice-to-have Need-to-have
Target Teams with resources Teams with stuck projects
Job-to-be-done "Find the best model" "Ship to production this month"

The Positioning Gap

Aemon = the optimization engine

You = the shipping engine

05

Target Customers (Not Research Teams)

๐Ÿš€ Series A-C Startups with AI Features

  • Have small ML teams, can't hire fast enough
  • ML engineers cost $200-400k and are hard to find
  • Need someone who can actually deploy, not just research

Pain: "We have 3 AI features in Jira blocked for months"

๐Ÿข Product Companies Adding AI

  • Non-ML companies adding AI features
  • Don't have ML expertise internally

Pain: "We want AI in our product but don't know where to start"

โš™๏ธ Enterprise AI Platform Teams

  • Drowning in stack churn (rebuilding every 3 months)
  • Coordination overhead killing velocity

Pain: "Platform team of 5 supporting 20 feature teams โ€” we're bottlenecked"

๐Ÿ›๏ธ Regulated Industries

  • 42% plan to add oversight features (vs 16% unregulated)
  • Need governance + observability

Pain: "Can't deploy AI without compliance sign-off"

06

Better Pitch Angles

1. "Your AI Projects Are Stuck. We Ship Them."

  • Target: Companies with AI projects "in progress" for months
  • Proof: Show deployment timelines (weeks vs months)
  • Wedge: Audit โ†’ identify stuck projects โ†’ ship one fast

2. "AI Observability + Ops as a Service"

  • Target: Companies with AI in production but no visibility
  • Pain: "We don't know when our AI is wrong"
  • Proof: Catch regressions, reduce incidents

3. "The AI Platform Team You Can't Hire"

  • Target: Scaling startups without MLOps expertise
  • Pain: ML engineers cost $400k and don't want to do ops
  • Proof: Infrastructure setup in days, not months

4. "CI/CD for AI" (existing pitch)

  • Still good, but position as production not research
  • Focus on deployment gates, not model selection
  • "Every AI PR tested against your evals before merge"

Action Items

  • Rewrite pitches with "production" and "ship" language
  • Target stuck projects โ€” companies with AI features in backlog
  • Lead with observability โ€” 63% want better visibility
  • Offer quick wins โ€” "Ship one AI feature in 2 weeks"
  • Avoid research teams โ€” they don't have budget urgency
Research โ€ข AgentDocs Wedges
01

AgentDocs Wedges
& Approaches

Based on Garry Tan's YC video insight: agents pick tools based on doc quality, not actual performance. The Whisper/Groq problem.

Claude Code defaulted to Whisper V1 โ€” a near-deprecated model โ€” because it has better documentation than Groq, even though Groq is 200x faster and 10x cheaper.

โ€” Garry Tan, YC Partner, Feb 2026

The Insight

Agents pick tools based on doc quality, not actual performance โ€” and that's exactly the gap AgentDocs exploits.

02

Wedge Scoring (6 Dimensions)

Wedge Mkt Pain Comp Fit x402 Time Total
๐Ÿฅ‡ LLM / Model Routing 553545 27/30
๐Ÿฅˆ Video Gen 553554 27/30
๐Ÿฅ‰ Audio / Transcription 352445 23/30
Deployment / Hosting 544434 24/30
Agent Identity (email/phone) 454335 24/30
Databases 543324 21/30
Image Gen 435343 22/30
03

x402 Market Reality Check

What agents are ACTUALLY spending on today (x402scan.com, Feb 2026)

$101K
24h Volume
513
Active Merchants
692
Crypto/Onchain Servers
0
Transcription Services

What Exists (Validated)

โœ“

Crypto/Onchain

692 servers โ€” dominant vertical

โœ“

AI Servers

486 servers โ€” led by Virtuals ACP ($163K/day)

โœ“

Search/Data APIs

216 servers โ€” StableEnrich, httpay

โœ“

Trading Intelligence

203 servers โ€” alpha signals

What's Missing (Opportunity)

0

Transcription

Zero servers โ€” Garry Tan example!

~1

Video/Image Gen

42 txns โ€” essentially nothing

0

Deployment/Hosting

Nothing

0

Databases

Nothing

04

Re-scored: x402 Demand vs Fit

Wedge x402 Now Holly Fit Verdict
Multi-API aggregation + capability layer โœ“ 3 players, no AgentDocs โœ“ Direct fit Best immediate wedge
Agent-to-agent coordination โœ“ $163K/day (Virtuals) โœ“ Holly as orchestrator Most validated demand
Social data for agents โœ“ StableSocial live โœ“ Fits Wurk agents Niche but real
Transcription (Whisper/Groq) โŒ Zero on x402 โœ“ Strong routing layer 6โ€“12 months early
Video gen โŒ Near zero โœ“ Strong dogfood 12โ€“18 months early

Key Insight

Absence of transcription/video/deployment on x402scan is opportunity signal, not rejection. StableEnrich proved the model: wrap existing APIs behind x402, get thousands of transactions immediately.

05

Recommended Launch Order

๐ŸŽ™๏ธ 1. Transcription โ€” NOW

Zero servers on x402. Garry Tan moment 6 days ago. First-mover window open.

AgentDocs value: Verified schema {input, model: "groq|deepgram|whisper", output}

Groq at $0.02/min โ†’ charge $0.03/min

"Your agent would have chosen Whisper V1. Ours chose Groq."

๐ŸŽฌ 2. Video Gen โ€” Dogfood Now

Parameter chaos problem (Kling uses cfg_guidance, Runway uses guidance_scale). Genuinely unsolved.

AgentDocs value: Agent sends {prompt, style, duration, budget}, Holly resolves params

Already dogfood โ€” Holly generates video

๐Ÿง  3. LLM Routing โ€” Big Vision

Agent says {task: "transcribe", latency: "fast"} โ†’ gets best provider with pricing + ready API call.

The purest AgentDocs wedge

๐Ÿ“ง 4. Agent Identity โ€” HOT

Garry Tan: "Has anybody built Twilio for agents yet?"

Email + phone + wallet in one API call

Jared Friedman (YC): "Even the best dev tools don't let you sign up via API. This is a big miss in the claude code age โ€” claude can't sign up on its own."

06

The Gap: Verified Snippet Services

3
Multi-API aggregators live
StableEnrich, httpay, LowPaymentFee
4.7K
Txns for StableEnrich
0
With capability contracts
0
With AgentDocs semantics

What They Do

Aggregate APIs (Apollo + Firecrawl + Grok + Serper) behind one x402 endpoint.

"Throw money at endpoint, get data back"

What They Don't Do

โŒ

Structured capability contracts

โŒ

Machine-readable reasoning

โŒ

Verification + benchmarks

The Unified Pitch

OpenHolly becomes the first x402-native capability registry for non-crypto agent needs โ€” the "Stack Overflow for agents" that makes every new category agent-accessible from day one.

07

How Agents Discover Agent-First Platforms

For agents, "discovery" = machine-interpretable services, not human landing pages.

1. Protocol-level Discovery (x402 v2)

Services expose structured metadata (endpoints, pricing, chains). Facilitators crawl and index.

2. Facilitator/Registry Indexing

Layer of facilitators that index x402 services, maintain up-to-date pricing/metadata.

3. Agent-Centric Wallets

Coinbase agentic wallets pre-integrated with x402. Discovery APIs built-in.

4. Semantic Capability Registries

"Internet of Agents" research: agents announce capabilities in machine-interpretable form.

The agent doesn't "Google" a platform; it queries its facilitator ("find a market-data API with latency <100ms and price <0.5ยข/request"), receives candidates with structured metadata, picks one, then talks HTTP+402 with it.

โ€” Perplexity Research, Feb 2026

01
WORLD MODELS

Synthetic Data for
Controllable Video Models

Video models can generate stunning visuals but can't follow precise instructions. The bottleneck isn't compute or architecture โ€” it's training data with exact state trajectories.

Data scaling plateaus at 200K-400K samples. The persistent ~15% gap between in-domain and out-of-domain performance isn't solvable with more data โ€” it requires architectural changes AND structured training data.

โ€” VBVR Paper (Wang et al.), Feb 2026

$5B+
Raised in world model space (2025-26)
20M hrs
NVIDIA Cosmos training data
50%
Model accuracy on physics (chance level)
97%
Human accuracy on same tasks
02

The Core Insight: Controllability Before Reasoning

Why Video Models Fail at Reasoning

1

Trained on natural video

Learned "everything moves together" โ€” can't isolate changes

2

No state tracking

Can't represent "object A moved, B stayed" explicitly

3

Errors compound

Step 1 error โ†’ Step 2 error โ†’ reasoning chain breaks

What Data Factory Provides

โœ“

Exact state trajectories

Frame-by-frame ground truth of what changed

โœ“

Parameterized variations

Same action, different contexts โ€” curriculum learning

โœ“

Physics-accurate simulation

Genesis/Isaac Sim backends for real dynamics

The Robot Arm Analogy

You can't teach a robot to cook if it knocks over the salt every time it reaches for the pepper. Same with video models: if they can't execute precise state transitions, chaining multi-step reasoning becomes impossible. Controllability is the prerequisite.

03

The "Data Factory" Architecture

๐Ÿ“‹
Domain Spec
"warehouse picking"
โ†’
โš™๏ธ
Scene Generator
Parameterized templates
โ†’
๐ŸŽฎ
Physics Sim
Genesis / Isaac Sim
โ†’
๐ŸŽฌ
Video + Labels
State trajectories
โ†’
๐Ÿ“ฆ
Training Dataset
LoRA-ready format

๐Ÿญ Vertical Templates

Pre-built scene generators for:

  • Warehouse robotics
  • Surgical verification
  • Manufacturing QA
  • Autonomous driving

โšก Scale Economics

Genesis claims:

430,000x

faster than real-time simulation

๐ŸŽฏ LoRA-Ready Output

Direct fine-tuning:

  • Wan2.1 / Wan2.2 compatible
  • Rank 32 = startup compute
  • ~$5K for domain model
04

Why This Wins

๐Ÿ“Š The Moat

The "data factory" โ€” parameterized generators + distributed workers โ€” is the real competitive advantage. No productized version exists for vertical industries.

Network effect: Each vertical adds templates โ†’ attracts more customers โ†’ funds more verticals

๐Ÿ’ฐ Business Model

  • Per-video pricing: $0.01-0.10 per synthetic clip
  • Dataset packages: $5K-50K per domain
  • Enterprise: Dedicated capacity + custom templates

Gross margins >80% (compute is cheap vs. real data collection)

Timing

Genesis open-sourced Dec 2024. NVIDIA Cosmos launched Jan 2025. ฯ€0 open-sourced Feb 2025. The infrastructure just became available โ€” but nobody has built the vertical data factory layer yet.

05

Competitive Landscape

Company Focus Gap
NVIDIA Cosmos Foundation models Not vertical-specific data
Genesis AI Physics engine No data pipeline layer
Physical Intelligence Robot foundation model Consumes data, doesn't sell it
Scale AI Data labeling Labels real data, doesn't generate
Data Factory (Us) Synthetic video data Full vertical pipeline โœ“

The dirty secret of robotics AI is that real-world data collection costs $100-1000/hour when you include robot time, human supervision, and failure recovery. Synthetic data at $0.01/clip changes the economics completely.

โ€” Industry estimate

01
WORLD MODELS

Benchmarking Infrastructure
for World Models

VLM-as-a-judge is expensive and non-reproducible. IntPhys shows models at chance level. Everyone's flying blind on what their world models actually understand.

Most models perform at chance levels (50%), in stark contrast to human performance, which achieves near-perfect accuracy (97%+). Current video understanding benchmarks do not capture intuitive physics.

โ€” IntPhys 2 (Meta FAIR), Jun 2025

50%
Best model accuracy (chance level)
97%
Human accuracy (same test)
47 pts
Gap to close
$5B+
Raised without rigorous eval
02

The Evaluation Crisis

Current State: Flying Blind

1

VLM-as-a-judge

Expensive ($0.10-1.00/sample), non-reproducible, biased

2

Demo-driven claims

Cherry-picked videos, no systematic testing

3

Benchmarks don't transfer

Academic benchmarks โ‰  production reliability

What's Needed

โœ“

Deterministic scoring

Rule-based, reproducible, cheap to run

โœ“

Human-correlated metrics

VBVR-Bench achieves ฯ > 0.9 with human judgment

โœ“

Domain-specific suites

Robotics, driving, medical โ€” each needs own benchmarks

The VBVR Breakthrough

VBVR-Bench demonstrates that rule-based evaluation can match human judgment (ฯ > 0.9 correlation). But it's research code, not a product. Domain-specific versions don't exist.

03

Product: Eval-as-a-Service

๐ŸŽฌ
Upload Video
Model output
โ†’
๐Ÿ”ฌ
Benchmark Suite
Physics / Control / Reasoning
โ†’
๐Ÿ“Š
Detailed Report
Scores + failure modes
โ†’
๐Ÿ“ˆ
Leaderboard
Public or private

๐Ÿงช Benchmark Suites

  • Physics: Object permanence, gravity, collisions
  • Control: Instruction following, state isolation
  • Reasoning: Multi-step causal chains
  • Domain: Robotics, driving, medical

๐Ÿ’ฐ Pricing

  • API: $0.01/video (basic)
  • Full suite: $0.10/video
  • Enterprise: Unlimited + custom
  • Leaderboard: Free tier for visibility

๐ŸŽฏ Target Customers

  • Runway, World Labs, DeepMind
  • Physical Intelligence, Wayve
  • Robot startups building on ฯ€0
  • Enterprise adopting video AI
04

Why This Works

๐Ÿ“Š Market Dynamics

$5B+ has been raised in world models with no standardized evaluation. Every company is building their own benchmarks internally. That's waste.

Comparable: ML evaluation market ~$500M (2024), growing 25%+ YoY

๐Ÿ”„ Network Effects

  • Leaderboard: Models compete โ†’ drives adoption
  • Benchmark contributions: Companies add domain tests
  • Data flywheel: More evals โ†’ better calibration

The gap between demo videos and production reliability is massive. Objects disappear, physics drifts, game logic is brittle over longer sessions. We need systematic evaluation, not cherry-picked demos.

โ€” GradientFlow Analysis, 2026

05

Roadmap

Q2

Launch Core Benchmarks

IntPhys-style physics, VBVR-style controllability, basic API

Q3

Domain Expansion

Robotics suite (ฯ€0 compatible), driving suite (Wayve/Comma style)

Q4

Public Leaderboard

Like HuggingFace but for world models. Attract submissions, build community

2027

Enterprise + Certification

"World Model certified for X domain" โ€” becomes industry standard

01
WORLD MODELS

Training Platform for
Robot Foundation Models

Training video/world models costs 10-100x more than LLMs. The infrastructure layer is missing. We build the "AWS for embodied AI."

Embodied AI training requires tight integration of simulation, rendering, and ML. Current cloud offerings are designed for LLMs. The infrastructure gap is massive.

โ€” Industry observation

$1B
World Labs raise (Feb 2026)
$600M
Physical Intelligence raise
$315M
Runway raise (Feb 2026)
10-100x
Video vs LLM compute cost
02

The Infrastructure Gap

What LLM Infra Provides

โœ“

GPU clusters

H100s, A100s, optimized networking

โœ“

Training frameworks

PyTorch, JAX, distributed training

โœ“

Data pipelines

Text ingestion, tokenization, streaming

What Embodied AI Needs (Missing)

โœ—

Integrated simulation

Physics engine + renderer + ML in one loop

โœ—

Video data pipelines

Frame extraction, state annotation, streaming

โœ—

Sim-to-real transfer

Domain randomization, reality gap tools

Why This Matters Now

ฯ€0 just open-sourced. Genesis just launched. Cosmos is available. The building blocks exist but nobody has assembled them into a platform. Every robotics startup is duct-taping their own stack.

03

Platform Architecture

๐ŸŽฎ Simulation Layer

Managed Genesis/Isaac Sim instances

  • One-click deployment
  • Auto-scaling workers
  • Pre-built environments
  • 430,000x faster than real-time

๐Ÿ“Š Data Layer

Video + state trajectory storage

  • Frame-level annotations
  • Streaming to training
  • Version control for datasets
  • Curriculum management

๐Ÿง  Training Layer

Optimized for video models

  • Pre-configured for ฯ€0, Cosmos
  • LoRA fine-tuning pipelines
  • Distributed video training
  • Eval integration built-in
๐ŸŽฏ
Define Task
โ†’
๐ŸŽฎ
Simulate
โ†’
๐Ÿ“ฆ
Generate Data
โ†’
๐Ÿง 
Train Model
โ†’
๐Ÿ“
Evaluate
โ†’
๐Ÿค–
Deploy
04

Market Opportunity

๐Ÿ“ˆ TAM Analysis

World model companies (funded)$5B+ raised
Robot startups (ฯ€0 ecosystem)100+ companies
AV companies (simulation needs)50+ companies
Enterprise robotics adoptionGrowing 30%+ YoY

Conservative estimate: $2B addressable market for embodied AI infrastructure by 2028

๐Ÿ’ฐ Business Model

  • Compute: GPU-hours (sim + training)
  • Storage: Video dataset hosting
  • Platform: Monthly SaaS for orchestration
  • Enterprise: Dedicated clusters + support

Target: 40-60% gross margins (better than pure GPU cloud)

05

Competitive Position

Player Sim Data Train Eval Deploy
CoreWeave/Lambda โœ— โœ— โœ“ โœ— โœ—
NVIDIA Omniverse โœ“ ~ โœ— โœ— โœ—
Genesis (OSS) โœ“ โœ— โœ— โœ— โœ—
Weights & Biases โœ— ~ ~ โœ“ โœ—
Us (Full Stack) โœ“ โœ“ โœ“ โœ“ โœ“

The Integration Thesis

Embodied AI requires tight coupling between simulation, data, and training. Point solutions create friction. An integrated platform captures the full workflow โ€” and the full margin.