R&D Engineer • Dependabot for AI
01

Dependabot for AI Models

Every week a new model drops. Your team manually benchmarks it. What if that happened automatically — with PRs when something's better?

52+

New models released per month

2-4 wks

Time to benchmark each one

$0

Tools that auto-PR improvements

02

The Problem

  • New model every week from OpenAI, Anthropic, Google, Meta, Mistral, open-source
  • Teams manually benchmark against their stack — takes days per model
  • By the time you finish testing, three more models dropped
  • No one knows if they're running the best model for their use case
03

Competitive Landscape

Company What They Do Gap
Portkey AI gateway, routing, 1600+ LLMs No auto-benchmarking against YOUR stack
Unify ($8M) Finds best LLM for the job Router-first, not benchmark-first
Braintrust ($36M, $150M val) Eval-driven development Reactive, not proactive
Us Watch → Auto-benchmark → PR when better
04

How It Works

1. Connect

Connect your AI stack. Define your eval suite (or we help you build one).

2. Watch

We monitor every model release across all providers. Automatically.

3. PR

When something beats your current setup, you get a PR with benchmarks.

The Dependabot Pattern

Watch → Auto-benchmark → PR when better

Nobody does this for AI models. We do.

05

ICP & Pricing

🎯 Target Customer

  • Any team with AI in production
  • 3+ AI features deployed
  • Series B+ (or well-funded Series A)
  • Engineering-led decision making

💰 Pricing

  • Starter $2K/mo — up to 5 endpoints
  • Growth $8K/mo — up to 20 endpoints
  • Enterprise $20K+/mo — unlimited
06

Why Now?

  • Model release velocity is accelerating — impossible to keep up manually
  • LMArena proved model evaluation is a $1.7B market (raised $150M)
  • Braintrust proved enterprises pay for evals ($36M Series A)
  • Nobody has combined continuous monitoring + proactive optimization
R&D Engineer • CI/CD for AI
01

CI/CD for AI

Software engineering solved "did my change break things?" 20 years ago. AI engineering still ships blind.

🔴 AI Today

Push prompt change → Hope it works → Find out in production

🟢 With Us

Push prompt change → Eval runs → PR blocked if quality drops

02

The Insight

The gap isn't that people don't have evals — Braintrust, Humanloop, and DSPy are giving them that.

The Real Gap

Evals aren't integrated as blocking gates in deployment pipelines the way unit tests are.

03

What We Build

GitHub Action + CI Integration

  • Automatically runs your eval suite against every PR that touches AI code
  • Prompts, model configs, RAG pipelines — all covered
  • If eval score drops → PR is blocked
  • If new model improves score → PR is auto-generated

Think: Braintrust's eval engine + Dependabot's automation + GitHub Actions' CI/CD — fused into one opinionated product.

04

Aemon vs Us

Dimension Aemon Us
Purpose Discover new optimal solutions Protect existing quality + incrementally improve
Posture Offensive R&D Defensive Ops
Buyer R&D Lead / ML Researcher Engineering Manager / Platform Team
Integration Standalone tool Lives in your CI/CD
05

ICP & Pricing

🎯 Target Customer

  • 3+ AI features in production
  • Series B+ companies
  • Engineering-led sale
  • Already using GitHub/GitLab CI

💰 Pricing

$2K – $20K/mo

Based on eval runs & endpoints

R&D Engineer • Private LMArena
01

Private LMArena

LMArena raised $150M at $1.7B valuation on public evals. Enterprises need private evals on their own data.

$1.7B

LMArena valuation (public evals)

???

Private enterprise eval market

02

The Problem with Public Benchmarks

  • Companies have been caught gaming LMArena scores
  • Public benchmarks don't reflect YOUR use cases
  • Generic evals ≠ production performance for YOUR data
  • Enterprises need proprietary intelligence
03

What We Build

Enterprise Model Intelligence Platform

  • Define eval suites from your production data
  • Continuously benchmark every new model release
  • Test every prompt variation, RAG config automatically
  • Output: Private leaderboard + recommended actions

Hugging Face's Yourbench is the open-source precursor — but it's a DIY tool requiring significant ML expertise. We productize it.

04

Aemon vs Us

Aemon Private LMArena
Evolves novel algorithms Evaluates existing models/configs
Research Intelligence
05

ICP & Pricing

🎯 Target Customer

  • 10+ AI features in production
  • $50K+/mo on AI infrastructure
  • VP of Engineering or Head of AI
  • Fintech, ad-tech, e-commerce, healthtech

💰 Pricing

$10K – $100K/mo

Enterprise contracts

R&D Engineer • AI Model FinOps
01

AI Model FinOps

Companies spend $85K+/mo on AI infrastructure. Nobody knows if they're overpaying for quality they don't need.

$85K

Avg monthly AI spend

36%

YoY growth

0

Visibility into cost-quality tradeoff

02

The Gap

Tool What It Does Missing
Portkey Routing, fallbacks No cost-quality optimization
Unify Cheapest model that meets threshold Not continuous, not production data
Us Continuously optimize cost-quality frontier across entire AI stack
03

What We Build

FinOps + Quality Optimization Layer

An agent that sits on top of your AI gateway:

  • Continuously profiles every AI call (model, cost, latency, quality)
  • Uses your production data as the eval
  • Generates actionable recommendations:
"Switch endpoint X from GPT-4o to Claude 3.5 Sonnet — saves $8K/mo, quality improves 2%"

"Your RAG pipeline on endpoint Y is underperforming — here's an optimized config"
04

ICP & Pricing

🎯 Target Customer

  • $20K+/mo on LLM APIs
  • CFO / VP Eng sale
  • Any industry with AI in production

💰 Pricing

$2K – $15K/mo

Pays for itself from savings

⚡ Easiest ROI story of all these ideas

R&D Engineer • Eval-as-a-Service
01

Eval-as-a-Service

Building good evals is harder than building the AI features themselves. We build the oracle.

02

The Insight

The Bottleneck Isn't Optimization

Braintrust's thesis: "If your eval is right, every decision becomes simple."

DSPy's framework depends on having good metrics to optimize against.

The bottleneck in the entire AI development loop is knowing what "good" looks like.

03

What We Build

Eval Generation Agent

  • Takes your production AI traces
  • Analyzes failure modes
  • Interviews domain experts (async, Slack-based)
  • Generates calibrated eval suites:

✓ Datasets

✓ Scoring rubrics

✓ Automated judges

Output plugs into Braintrust, DSPy, or your own CI/CD.

04

Aemon vs Us

Aemon Eval-as-a-Service
Assumes you have a good eval function Creates the eval function
Optimizer Oracle
Depends on eval quality Is the prerequisite to everything else

If you own the eval layer, you become the foundation every optimization tool depends on.

05

ICP & Pricing

🎯 Target Customer

  • Same as Braintrust's customers
  • AI product teams at Series B+
  • Earlier in journey — before they've figured out evals

💰 Pricing

$5K – $30K/mo

Per eval suite built + maintenance

01
LEAD PITCH: Fat Startup

AI-Powered Outcomes.
Not Tools. Not Reports.

We operate fleets of AI agents that deliver results. Customers get outcomes. We get playbooks. Playbooks become platform.

Fat Startup $4K MRR 5 Customers Playbooks Compounding

A fat startup ships outcomes, not features. It bundles software, data, and human ops into one integrated product that actually gets the job done.

— Andrew Lee, a16z Speedrun Partner

02

The Shift

Spinning up AI agents is now trivial. Managing them is the new bottleneck.

What's Easy Now

🤖

One-click agent deployment

OpenClaw, dockerized instances, cloud GPUs

🔀

Capable models

GPT-5, Claude 4, open-source alternatives

💰

Economics work

$0.01-0.10 per task, not $50/hr

What's Still Hard

⚠️

People become pseudo-IT

Babysitting agents instead of running business

⚠️

Debugging eats time

Every hour on agent issues ≠ hour on actual work

⚠️

No one wants to manage agents

They want outcomes, not infrastructure

The Insight

Founders are too busy to become AI ops engineers. We absorb that complexity so they can focus on their actual business.

03

How We Got Here

We started in sales. Then customers kept asking for more.

📧
Started: Sales
SDR automation
🎬
Then: Video
ML training data gen
🔬
Then: Research
University lab assets
💡
Pattern
We manage, they get results

The Variety We've Delivered

🏗️

SDR for construction companies

Lead gen + qualification

🎬

Video generation for ML training

Synthetic data pipelines

🔬

Research assets for universities

Literature review + synthesis

🚀

BDR for startups

Outbound + meeting booking

The Common Thread

Every customer had the same problem:

"I tried spinning up agents myself. Then I spent all my time debugging them instead of running my business."

— Pattern across customers

They didn't want to manage AI. They wanted outcomes.

04

The Market Reality

95%
AI projects fail before production
MIT Project NANDA
70%
AI SDR users churn in 3 months
Industry data
$47K
Lost from one agent runaway
TechStartups
171%
ROI when deployment succeeds
MIT NANDA

Why Tools Aren't Enough

Companies don't want to become AI operations experts. They want someone to absorb the complexity and just deliver results.

05

The Model: Managed AI Operations

We operate agent fleets. Customers get outcomes. We encode playbooks.

🎯
Customer Goal
"50 qualified meetings/month"
Our Engineers
Configure agent fleet
🤖
Agent Fleet
Research, outreach, qualify
Outcome
Meetings on calendar

DIY / SaaS Tools

🛠️

You manage the agents

Become pseudo-IT for AI

🐢

Weeks to figure out

Setup, config, debugging

Hope it works

No guarantee of outcomes

OpenHolly (Us)

We manage the agents

You focus on your business

Results in days

We've done this before (playbooks)

🎯

Outcomes guaranteed

Pay for results, not effort

06

Current Focus: GTM/Sales

Starting with sales because the outcome is measurable: meetings booked.

Why Sales First

📊

Clear success metric

Meetings booked = revenue

💔

Broken market

70% AI SDR churn = customers looking for alternatives

💰

High willingness to pay

$5-10K/month for what works

We have traction

50% of our revenue is SDR/BDR

What We Deliver

🔍

Research Agent

Deep prospect intelligence

✍️

Outreach Agent

Personalized messaging

📋

Qualification Agent

Score and prioritize leads

📅

Scheduling Agent

Book the meeting

Expansion Path

Sales → Research/Intel → Operations → Content. Each vertical = new playbook, same infrastructure.

07

The Unlock: Playbooks Compound

Every engagement encodes a playbook. Playbooks make the next engagement faster. This is how we build the moat.

🛠️
Year 1: Agency
Do the work, learn playbooks
📚
Year 2: Productize
Playbooks become templates
🏗️
Year 3: Platform
Others build on our templates

What's In A Playbook

Every engagement becomes encoded knowledge:

📝

Workflow sequences

What steps work for each use case

🎯

Prompt templates

Messaging that actually converts

⚙️

Agent configurations

Which models, tools, and sequences

🚫

Failure patterns

What breaks and how to prevent it

The Compounding Effect

1️⃣

Customer 1: 2 weeks

Figure everything out from scratch

5️⃣

Customer 5: 3 days

Apply existing playbook + customize

🔟

Customer 10: Hours

Playbook is battle-tested

🏗️

Eventually: Self-serve

Playbooks become product

The Fat Startup Advantage

We're getting paid to build our moat. Every dollar of revenue = more encoded knowledge. Competitors starting later start from zero.

08

Technical Insight

We're productizing the research consensus on what actually works.

The Research Convergence

📄

Workflow-First Architecture

Declarative orchestration beats autonomous agents (Microsoft, 2024-25 surveys)

👤

HITL as Training Signal

Human edits train intervention policies (ReHAC, EMNLP 2024)

🎯

Playbooks as Optimization Surface

Prompts + tool-use are parameters to optimize (AVATAR, NeurIPS 2024)

🛡️

Guardrails are Required

Transparency + oversight for multi-agent systems (Nature, 2026)

Our Implementation

Declarative playbooks

Versioned configs, not imperative code

Logged human checkpoints

Every edit = structured training signal

Continuous optimization

Prompts, branching, model routing improve over time

Action-layer guardrails

Can't be prompt-injected, auditable

We log trajectories, human edits, and outcomes, then update prompts, branching logic, and model routing so the same business objective is achieved more reliably over time. The playbook is the learned policy space.

— Our technical thesis

09

The Compound Library

The internal system that makes agent workflows repeatable and efficient.

🔧
Verified Tools
Tested integrations
+
💬
Working Prompts
By use case + vertical
+
🧠
Model Routing
Which model where
+
🚫
Failure Patterns
What breaks + fixes
📦
New Client Workflow
Compose from proven components

Without This System

🔄

Reinvent every time

Which tools? Which prompts? Which models?

🐢

Slow iteration

Learn the same lessons repeatedly

📉

Linear scaling

More clients = more eng hours

With The Compound Library

Compose from proven

Verified, tested, reusable primitives

📈

Each engagement adds

Learnings feed back into system

🚀

Sublinear scaling

More clients = richer library = faster

The Compounding Effect

Workflow #1 takes a week. Workflow #10 takes a day. Workflow #100 takes hours. The library IS the moat.

10

Why Us

Team

Keith Schacht

Co-Founder + Advisor

$140M exit (Mystery) · a16z funded · Built consumer products used by millions

[CTO]

Co-Founder & CTO

CTO of Because · $3M Seed · Deep agent infrastructure experience

Yasir

Co-Founder

yapthis.com · Agentic architecture · Shipped production agent systems

Unfair Advantages

🐕

We're running on OpenClaw

Dog-fooding our own infrastructure daily

📊

We've built observability

ClawView for agent monitoring

🛡️

We've built guardrails

Agent Seatbelt for safety

💵

Revenue already

$4K MRR, +$2K this week

11

Traction

$4K
MRR
5
Customers
50%
SDR/BDR
+$2K
Added This Week

What This Proves

Companies will pay for AI-powered outcomes when someone else manages the complexity. The demand is real. The model works.

12

The Ask

What We Need

💰

$[X] Pre-Seed

Scale agent fleet + engineering team

🎯

12-month goal: $1M ARR

Prove the playbooks at scale

📚

Then: Productize

Turn proven playbooks into self-serve templates

Why Now

🚀

OpenClaw + GPT-5 + Claude 4

Agents just became capable enough

💔

AI SDR market burned

70% churn = customers looking for what works

First-mover on playbooks

Every month we operate = more encoded knowledge

OpenHolly: AI-Powered Outcomes

Customers get results. We get playbooks. Playbooks become platform.

01
V1: Personal AI OS

Your Personal AI OS

An AI that knows your context, anticipates your needs, and takes action on your behalf—not a chatbot you have to prompt.

Pre-Seed $4K MRR 5 Customers Always-On AI

The Vision

Imagine an AI that actually knows you—your work, your preferences, your patterns. It doesn't wait for commands. It proactively handles tasks, flags important things, and learns from every interaction.

02

The $56B Opportunity

Personal AI assistants are about to explode.

$16B
AI Assistant Market 2024
Grand View Research
$56B
Projected by 2034
Market.us (38% CAGR)
75%
Households with AI by 2025
Gartner Forecast
72%
US Teens Use AI Companions
2025 Study

AI personal agents will arrive soon. What we do now with apps—manually, and in piecemeal fashion—will be done automatically. If a flight is cancelled, an AI agent will rebook the flight, reschedule meetings, and order food.

— Goldman Sachs, "What to Expect from AI in 2026"

03

Why Current Assistants Fail

Siri, Alexa, and Google Assistant lost the AI race. Here's why.

❌ The Problem

🧠

No Persistent Memory

Context resets after 2-3 turns. They forget everything.

⏸️

Reactive, Not Proactive

Wait for commands. Never anticipate needs.

🔒

Siloed Knowledge

Can't connect your email, calendar, work, and life.

🤖

Limited Actions

"I can't do that" is their signature phrase.

✓ Personal AI OS

🧠

128K+ Token Context

Remembers weeks of interactions. Learns your patterns.

Proactive Intelligence

Anticipates what you need before you ask.

🔗

Connected Context

Sees your whole digital life—with your permission.

🛠️

Real Actions

Browser, shell, files, messages—actual work gets done.

Microsoft's CEO called AI assistants "dumb as a rock." The truth is, they've stagnated while chatbots evolved.

— Industry Analysis, 2023-2024

04

The Hardware Graveyard

Why dedicated AI devices keep failing—and what we learned.

$699
Humane AI Pin
Flopped 2024 — WIRED "Biggest Flop"
$199
Rabbit R1
"Underwhelming, underpowered" — The Verge
$350M
Rewind/Limitless
Acquired by Meta Dec 2025
$2.7B
Character.AI
Google licensing deal 2024

The Lesson

Hardware failed because it created friction instead of removing it. The winning approach: software that works with your existing devices—phone, laptop, wearables—not another gadget to carry.

Both Rabbit R1 and Humane AI Pin missed a crucial opportunity: integrating with existing user bases. Why create a separate device when you could leverage smartphones and their vast ecosystem?

— Medium Analysis, July 2024

05

Proactive vs. Reactive

The fundamental shift in how AI should work for you.

⏸️
Reactive AI
You ask → It responds
Proactive AI
It notices → It acts
🧠
Anticipatory AI
It predicts → You approve

Reactive (Siri/ChatGPT)

"Hey Siri, add milk to my shopping list"

"ChatGPT, summarize this document"

You initiate every interaction. You remember to ask.

Proactive (Personal AI OS)

"You're almost out of milk. Added to cart—confirm?"

"Your flight changed. I rebooked + rescheduled 2 meetings."

AI monitors context. Surfaces what matters. Acts with permission.

Gartner predicts 40% of enterprise apps will embed task-specific AI agents by 2026, evolving assistants into proactive workflow partners.

— Forbes, "Agentic AI Takes Over," Dec 2025

06

Why Now?

Four converging forces make this the moment.

Technology Ready

🧠

GPT-5 / Claude 4

Models finally capable of real reasoning

📝

128K+ Context Windows

Memory across weeks of interaction

🔧

MCP + Tool Use

Agents can control apps natively

💰

Economics Work

$0.01-0.10 per task, not $50/hr

Market Ready

📈

96% Enterprise Expansion

Plan to increase agentic AI budgets

PwC May 2025 Survey
🎯

25% → 50% Adoption

Enterprise GenAI agents 2025 → 2027

Deloitte Forecast
😤

Siri Fatigue

95% frustrated with current assistants

The Manifest Survey
🔐

Privacy Tailwinds

Apple Intelligence proves local AI demand

07

What Users Actually Want

From surveys, Reddit, and academic research.

Desires

🧠

Memory That Persists

"Remember what I told you last week"

Proactive Help

"Remind me before I forget"

🎯

Deep Personalization

"Know my preferences without asking"

🔐

Privacy Control

"My data stays mine"

Evidence

93% of respondents predict agentic AI will enable more personalized, proactive, and predictive services.

— Cisco 2025 AI Study

An assistant that knows you. The future of personal assistants is when the helper learns from your data, documents, and writing style.

— AI Industry Forecast 2026

08

How It Works

Always-on AI that learns, anticipates, and acts.

👁️
Observes Context
Email, calendar, browsing, work
🧠
Learns Patterns
Preferences, routines, priorities
💡
Surfaces Insights
Proactive suggestions
Takes Action
With human approval

Current Focus: SDR/BDR

🔍

Research Agent

Deep prospect intelligence

✍️

Outreach Agent

Personalized messaging

📅

Scheduling Agent

Meeting coordination

Platform Vision

📧

Email Intelligence

Triage, draft, follow-up

📊

Research & Analysis

Deep work on autopilot

🔧

Ops & Admin

The tasks you hate, automated

09

The Unique Wedge

What makes this different from Siri/Alexa/Google Assistant?

Big Tech Assistants

🏢

Built for mass market

Generic. Lowest common denominator.

📊

Data goes to them

Your context trains their models.

🔒

Walled garden

Only works in their ecosystem.

⏸️

Stagnant development

Lost the AI race years ago.

Personal AI OS

🎯

Built for power users

Deep personalization for serious work.

🔐

Your data stays yours

Local-first. You control what's shared.

🔓

Cross-platform

Works with your existing tools.

🚀

Cutting-edge models

GPT-5, Claude 4, always the best.

The Positioning

We're not competing with Siri for "set a timer." We're building the second brain for knowledge workers—people who will pay for AI that actually makes them more effective.

10

Traction & Team

$4K
MRR
5
Customers
50%
SDR/BDR
+$2K
This Week

Keith Schacht

Co-Founder + Advisor

$140M exit (Mystery) · a16z funded · Built consumer products used by millions

[CTO]

Co-Founder & CTO

CTO of Because · $3M Seed · Deep agent infrastructure

Yasir

Co-Founder

yapthis.com · Agentic architecture · Shipped production agent systems

11

The Ask

What We Need

💰

$[X] Pre-Seed

Scale agent infrastructure + team

🎯

12-month goal: $1M ARR

Prove the Personal AI OS at scale

📚

Then: Consumer launch

Personal AI for everyone

Why This Team

🐕

We use it daily

Dogfooding OpenClaw constantly

📊

Built observability

ClawView for agent monitoring

🛡️

Built safety

Agent Seatbelt for guardrails

💵

Already have revenue

Proving demand before pitching

OpenHolly: Your Personal AI OS

An AI that knows you, anticipates your needs, and takes action—not just another chatbot waiting for prompts.

The Thesis in One Line

The shift from reactive AI to proactive AI is a $56B market. We're building the operating system for it.

01
V2: Outcome-Based Pricing

Pay Per Meeting,
Not Per Seat

The SaaS pricing model is breaking. AI does the work now—so why pay for human logins? We deliver outcomes and charge when they happen.

$4K MRR 0% Churn Outcome-Aligned 5 Customers

AI is driving a shift toward outcome-based pricing. Per-seat is no longer the atomic unit of software. If AI can handle a sizable proportion of customer support, companies will need far fewer human agents, and therefore fewer software seats.

— a16z Enterprise Newsletter, December 2024

02

The Pricing Revolution

SaaS pricing is undergoing its biggest shift since the cloud. AI is killing the per-seat model.

61%
SaaS using usage-based pricing (2022)
OpenView
30%
Enterprise SaaS with outcome-based by 2025
Gartner
43%
Enterprise buyers prefer outcome/risk-share pricing
Industry Data
2-3x
Higher traction for outcome-priced AI products
BetterCloud 2025

Seat-based pricing may not fit when AI is doing the work. If an agent replaces a human task, customers will expect to pay based on outcomes, not log-ons.

— Bain Technology Report 2025

03

Why Seats Are Dying

The logic of per-seat pricing breaks when AI replaces the humans who need seats.

The Broken Math

📉

AI replaces 10 analysts with 1 agent

Per-seat pricing undervalues the automation

💸

$5-10K/month regardless of results

70% churn when outcomes don't follow

Soft ROI = death at renewal

2025 pilots hitting 2026 renewals—"are we really getting value?"

The New Model

🎯

Pay for work completed

Not for access to tools

📊

ROI in their sleep

Customers calculate value instantly: $X per meeting = clear math

🤝

Aligned incentives

We only win when you win

The Bessemer Thesis

AI-native companies are abandoning seat-based SaaS pricing in favor of usage-, output-, and outcome-based models that directly align revenue with measurable results.

— Bessemer Venture Partners, "The AI Pricing and Monetization Playbook" (Feb 2026)

04

Who's Already Winning

The market leaders are proving outcome-based AI pricing works at scale.

Intercom Fin

Customer Support AI

$0.99 per resolution

65% resolution rate. Aligns every team around one outcome: resolved tickets. Now deployed at 99% of conversations.

Zendesk AI Agents

Customer Support AI

Outcome-based pricing

"First in CX industry to offer outcome-based pricing for AI agents" — August 2024 announcement.

EvenUp

Legal AI

Per demand package

AI + legal experts generate personal injury demand letters. Per output pricing, not hourly.

Decagon

Enterprise AI Support

Per-conversation + per-resolution

Hybrid model. Usage (conversations) + outcome (resolutions). Featured in a16z podcast.

Leena AI

Employee Support AI

ROI-based (tickets closed)

Shifted from consumption → outcomes. Customers gained clearer ROI, business accelerated.

Scale AI

Data Labeling → Platform

$13.8B valuation

Started as labeling services. Became infrastructure. Services → outcomes → platform.

The Pattern

Every major AI-native company is moving toward outcome-based pricing. This isn't experimentation—it's convergence.

05

Why Enterprises Love It

43% of enterprise buyers consider outcome-based pricing a significant factor in purchase decisions.

Buyer Psychology

🧮

Instant ROI Calculation

"$X per meeting booked" = CFO-ready math. No spreadsheet gymnastics.

🛡️

Zero Implementation Risk

If it doesn't work, you don't pay. Risk transferred to vendor.

📈

Scales With Value

More meetings = more spend = more value captured. Natural expansion.

🔄

No Renewal Anxiety

You're paying for results. Why churn from something that works?

What Buyers Say

"Why should we pay $X per user if we could pay $Y per outcome? Aligning price with realized value improves the ROI calculus."

— Enterprise buyer sentiment (Industry research)

"The fundamental shift is to stop charging for access and start charging for work done."

— Bain Technology Report 2025

Deloitte 2026 Prediction

"Outcome- or value-based pricing is based on the real business results that SaaS applications with AI agents produce. There will be a gradual move toward a future powered by integrated, autonomous multi-agent systems."

06

Our Model: Pay Per Meeting

We operate AI agent fleets that book qualified sales meetings. You pay only when meetings happen.

🎯
Define Outcome
"50 qualified meetings/month"
🤖
Agent Fleet Works
Research, outreach, qualify, book
📅
Meeting Booked
Verified on calendar
💰
You Pay
Only for outcomes

❌ Traditional AI SDR

$5-10K
/month regardless of results
70%
Churn in 3 months
???
ROI unclear, hard to justify

✓ OpenHolly Outcome Model

$250-500
Per qualified meeting booked
0%
Risk if agents don't perform
ROI: only pay when it works
07

Unit Economics That Work

Outcome-based pricing isn't charity—it's better economics for everyone.

Our Economics

💵

$250-500 per meeting

Customer pays on outcome

🤖

$30-80 cost to deliver

AI compute + tooling + human oversight

📈

3-7x margin

Healthy unit economics, scales with volume

🔄

Playbooks compound

Each meeting → better templates → lower cost

Customer Economics

Meeting = $5K-50K deal potential

$250-500 per meeting is a no-brainer

Zero upfront commitment

Start small, scale with proof

Budget predictability

Cost tracks linearly with value

Easy internal approval

CFO loves outcome-based spend

The Intercom Lesson

"Intercom's $0.99 per resolution aligns every team around one outcome: resolved tickets. If Fin resolves a ticket in three messages or thirty, the customer pays the same. The risk is real—but the reward is equally real: customers know exactly what they're getting, and they can calculate ROI in their sleep."

— Bessemer, Feb 2026

08

Managing the Risks

Outcome-based pricing has real risks. Here's how we mitigate them.

The Risks

⚠️

Cost variability

Some meetings cost more than others

⚠️

Revenue unpredictability

Customer usage varies month to month

⚠️

Attribution disputes

"Did your AI really book this?"

⚠️

Abuse potential

Customers gaming the system

Our Mitigations

Minimum commitments

Base retainer + outcome fees = floor

Playbook compounding

Cost per outcome drops with scale

Clear outcome definitions

Contractually defined: what counts

Full audit trail

Every action logged, no disputes

Industry Standard Emerging

"Agreements around basic definitions for things like 'an agent,' 'a task,' 'a process,' 'an interaction,' and 'an outcome' should be clearly defined, communicated, and agreed upon contractually." — Deloitte TMT Predictions 2026

09

Traction

$4K
MRR
5
Customers
0%
Churn
+$2K
Added This Week

Why Zero Churn

🎯

Aligned incentives

They pay for results → they get results → no reason to leave

📈

Clear value

Every invoice shows exactly what they got

🔄

Natural expansion

"It's working—give me more"

Customer Mix

🏗️

50% SDR/BDR

Our wedge: sales meetings

🎬

30% Video/ML

Synthetic data pipelines

🔬

20% Research

University lab assets

When you only pay for results, there's no reason to churn. Aligned incentives = sticky customers. This is why Intercom's outcome-based Fin has 99% deployment.

10

Team

Keith Schacht

Co-Founder + Advisor

$140M exit (Mystery) · a16z funded · Built consumer products used by millions

[CTO]

Co-Founder & CTO

CTO of Because · $3M Seed · Deep agent infrastructure experience

Yasir

Co-Founder

yapthis.com · Agentic architecture · Shipped production agent systems

Unfair Advantages

🐕

Dog-fooding daily

Running on OpenClaw infrastructure

📚

Playbooks compounding

Every engagement → better templates

Why Outcome-Based Wins

💰

We absorb the risk

Customers love it → lower CAC, zero churn

🎯

We're incentivized to deliver

Better AI = more margin for us

11

The Thesis

You post a bounty: "$500 per meeting booked." AI agents compete. Whoever performs best gets paid. We already do this with bug bounties, Kaggle, hackathons. Why not for AI agents?

— Macy Mills, a16z Speedrun Partner

Why Now

📈

Market timing

61% → 30%+ outcome-based adoption wave

💔

AI SDR burnout

70% churn = customers looking for what works

🏢

Enterprise demand

43% prefer outcome-based pricing

Comparable Outcomes

🚀

Scale AI: $13.8B

Services → outcomes → platform

📊

Pilot: $1.2B

Bookkeeping outcomes, not seats

💬

Intercom Fin

$0.99/resolution, 99% deployment

OpenHolly: Pay Per Outcome

AI agents that deliver results. You only pay when they do. The future of how work gets priced.

📚 Sources

a16z Enterprise Newsletter (Dec 2024) • Bessemer "AI Pricing Playbook" (Feb 2026) • Bain Technology Report 2025 • Deloitte TMT Predictions 2026 • OpenView SaaS Benchmarks • Gartner • EY "SaaS Transformation with GenAI" (Nov 2025) • BetterCloud "AI and SaaS Industry 2026" • Intercom Fin pricing page • Zendesk AI Agents announcement (Aug 2024)

01
V3: Anti-AI-SDR

The $500M AI SDR Market
Is Imploding. We're the Fix.

50-70% churn rates. LinkedIn bans. Domain blacklists. The "autonomous AI SDR" thesis failed. Human-in-the-loop is winning.

50-70%
AI SDR Churn Rate
Common Room, Feb 2025
$7.5K
Spent for 1 Demo
Reddit r/SaaS, Dec 2025
0
Sales from AI SDR Leads
Theory Ventures CRO
80%+
Human-in-Loop Success
MarketBetter G2: 4.97/5
02

The AI SDR Disaster: Real Data

"AI SDRs don't work—biggest bubble in tech." — LinkedIn comment with 400+ likes

💀 What's Actually Happening

"Their AI continuously hallucinated, getting things wrong about what my company does, the industry we are in, what products we sell. 1 positive reply, 1 demo, thousands of prospects touched, $7.5K down the drain."

— r/SaaS, Dec 2025

"A CRO from a publicly traded company disclosed that while an AI SDR helped generate a substantial volume of leads over a nine-month period, it did not lead to actual sales."

— Tomasz Tunguz, Theory Ventures

"Reports emerged of Artisan accounts, including those of team members and founders, facing restrictions or bans for suspected spam and automation violations."

— Quasa.io, Jan 2026

📊 The Numbers Don't Lie

📉

50-70% Annual Churn

2x the churn of human SDRs (a role notorious for turnover) — Common Room

🚫

LinkedIn Bans Spreading

Platform ramped up AI detection, restricting automation-heavy accounts

📧

Domain Blacklisting

Gmail filtering harshened. Sender reputations destroyed in weeks.

⚖️

Legal Exposure

GDPR fines up to 4% revenue. TCPA: $500-1,500 per message.

💔

Brand Damage

"Permanent brand damage from being publicly associated with spam" — NUACOM

03

Even VCs Are Calling It

TechCrunch: "AI sales rep startups are booming. So why are VCs wary?"

"When one studies any of these startups individually, it's like 'wow, that's stunning product market fit.' When all 10 of them have stunning product market fit, it's hard to answer 'How is that going to play out?'"

— Shardul Shah, Partner, Index Ventures (hasn't invested)

"Without access to differentiated data, AI SDR startups risk being overtaken by incumbents like Salesforce, HubSpot, and ZoomInfo."

— Chris Farmer, CEO, SignalFire

"Investors are not surprised by the rapid adoption of AI SDRs; they are just doubting that adoption is sticky."

— TechCrunch, Dec 2024

The Jasper Cautionary Tale

$1.5B → 30% Layoffs

Jasper, the AI copywriting unicorn, ran into speed bumps and had to lay off 30% of staff after ChatGPT launched. AI SDRs face the same commoditization risk.

Why Adoption Isn't Sticky

1

Garbage In, Garbage Out

Built on commoditized LinkedIn data = undifferentiated output

2

Ops is Afterthought

Black boxes that create more work, not less

3

Feature, Not Product

Incumbents (Salesforce, HubSpot) can bundle this free

04

The Fundamental Flaw: Autonomous ≠ Better

"The AI SDR is dead, long live the AI SDR: How the future is Human-in-the-Loop"

❌ Why Autonomous Fails

🤖

No Emotional Intelligence

Can't read tone, context, or cultural nuance essential in enterprise sales

🎯

No Real Consent

Scraped data without consent → GDPR/CCPA violations

⚖️

No Accountability

When AI misleads, your company bears the liability

🔄

Volume Over Value

"More volume on a bad message is not a strategy. It is self-sabotage."

👻

Fake Personalization

"Commenting on someone's hoodie feels forced because it's a hollow observation"

✓ What Actually Works

"Teams that use AI to support human insight consistently outperform teams trying to replace humans entirely. It's not even close."

— Matthew Metros, The AI SDR is Dead

🔍

AI Does Research (90%)

Data mining, signal detection, prospect prioritization

👤

Humans Do Relationships (10%)

Judgment, trust, closing

Human-in-Loop = Higher Ratings

MarketBetter (human oversight): 4.97/5 G2 rating

📈

Better Outcomes

"Human-in-the-loop platforms consistently outperform fully autonomous ones"

05

OpenHolly: The Anti-AI-SDR

We're not building another AI SDR. We're building what should have been built from the start.

❌ 11x / Artisan / AiSDR

🤖

Replace human judgment

"Autonomous AI employee"

📧

Optimize for volume

"6,000 contacts/month"

💰

Per-seat pricing

$5-10K/mo regardless of results

📦

You manage the tool

Become pseudo-IT for AI

🎰

Hope it works

No outcome guarantees

✓ OpenHolly

👤

Augment human judgment

AI research + human checkpoints

🎯

Optimize for quality

Right message, right person, right time

💵

Outcome-aligned pricing

Pay for meetings, not seats

🛠️

We manage the agents

You focus on your business

Results guaranteed

Outcomes or you don't pay

06

How OpenHolly Works

AI handles the research. Humans make the decisions. You get meetings.

🎯
Your Goal
"50 qualified meetings/mo"
🔍
AI Research
Signals, intent, fit scoring
👤
Human Checkpoint
Review & approve outreach
✍️
AI Execution
Send, follow-up, schedule
📅
Meeting Booked
Qualified, on calendar

What AI Handles (90%)

🔍

Deep Prospect Research

Intent signals, company news, technographics, pain points

📊

Lead Scoring & Prioritization

Who to contact and why, right now

✍️

Draft Generation

Personalized outreach based on real signals

📧

Multi-channel Execution

Email, LinkedIn (safely), follow-ups

What Humans Handle (10%)

Approval Gates

Review before sending to high-value prospects

💬

Live Conversations

When a prospect engages, humans take over

🎯

Strategy & ICP

Define who you want to reach and why

🧠

Judgment Calls

Edge cases, sensitive prospects, brand protection

07

The Market Opportunity: Fix AI SDR

Their 50-70% churn is our customer acquisition channel.

$500M+
Raised by AI SDR startups
11x, Artisan, AiSDR, etc.
50-70%
Will churn this year
Common Room data
$250M+
Churned customers/year
Market opportunity
Human-in-Loop
What they'll switch to
The thesis

The Churned Customer Profile

💔

Burned by AI SDR tools

Spent $5-10K/mo, got spam complaints

📧

Domain reputation damaged

Need to rebuild sender trust

😤

Still need meetings

The problem didn't go away

🎯

Now understand quality > volume

Educated by failure

Why They'll Choose Us

Outcome-based pricing

Only pay for meetings that happen

🛡️

Brand protection

Human oversight prevents embarrassments

📊

Proven playbooks

We've learned what works across verticals

🤝

We absorb the complexity

They don't manage agents, they get results

08

Traction: The Thesis Is Working

$4K
MRR
5
Customers
0%
Churn
+$2K
Added This Week

Why Zero Churn

Aligned Incentives

When customers only pay for results, there's no reason to churn. If we don't deliver meetings, they don't pay. Simple.

vs. AI SDR Churn

AI SDRs charge $5-10K/mo whether or not they work. When they don't deliver, customers leave. Misaligned incentives = 50-70% churn.

Team

Keith Schacht

Co-Founder + Advisor

$140M exit (Mystery) · a16z funded · Built consumer products used by millions

[CTO]

Co-Founder & CTO

CTO of Because · $3M Seed · Deep agent infrastructure

Yasir

Co-Founder

yapthis.com · Agentic architecture · Production agent systems

09

The Ask

What We Need

💰

$[X] Pre-Seed

Scale human oversight operations + agent infrastructure

🎯

12-month goal: $1M ARR

Prove the anti-AI-SDR thesis at scale

📚

Then: Productize

Turn proven playbooks into self-serve platform

Why Now

💔

AI SDR market imploding

50-70% churn = massive displaced customer base

📈

Human-in-loop proven

Highest G2 ratings go to human-oversight tools

First-mover on "fix"

Position as the safe alternative before market consolidates

OpenHolly: The Anti-AI-SDR

AI SDRs promised automation. They delivered spam, bans, and brand damage. We deliver meetings — with human judgment where it matters. Their 50-70% churn is our customer acquisition channel.

📚 Sources

Common Room "The AI SDR is dead" (Feb 2025) · TechCrunch "AI sales rep startups are booming. So why are VCs wary?" (Dec 2024) · Reddit r/SaaS AI SDR complaints · Quasa.io Artisan LinkedIn bans (Jan 2026) · Pipeline Group "Hidden Dangers of AI SDRs" · Theory Ventures SaaStr Talk · MarketBetter G2 Reviews

01
V5: Agent Seatbelt

The Safety Layer
Before AI Gets the Keys

Browser-layer guardrails that block irreversible AI actions before they happen.

$47K
Lost in one AI runaway
84%
Have zero safety boundaries
3am
When agents go rogue
100%
Preventable with guardrails
02

The "$39K Gone in a Blink" Problem

AI agents fail not from bad models, but from bad guardrails. 84% of companies deploying agents have zero safety boundaries defined.

— GenDigital Agent Trust Hub Research, 2026

What Goes Wrong

💸

Runaway API costs

$47K overnight cloud bills

📧

Wrong recipients

AI SDR emails competitors

🗑️

Irreversible actions

Deleted production data

🔓

Credential leaks

Pricing sent to wrong channel

What We Block

Site-specific rules

Block LinkedIn "Follow" for AI SDRs

Action classification

Read vs. Write vs. Irreversible

Human approval gates

Require confirmation for risky ops

Rate limiting

Prevent runaway loops

03

How It Works

Chrome extension that intercepts agent browser actions

🤖
Agent Action
🛡️
Seatbelt Intercept
⚖️
Risk Classification
Allow / Block / Human

Why Browser Layer

Framework-agnostic. Works with any AI agent (OpenClaw, LangChain, AutoGen, custom). Install once, protect everything.

04

Market & Competitive Position

Why Now

📈

OpenClaw: 9K → 60K stars

Autonomous agents exploding

⚠️

CyberArk security concerns

Enterprise worried about agent security

📜

EU AI Act

Regulatory tailwinds for safety

Competition

🟡

GenDigital Agent Trust Hub

Just launched - validates market

🟢

Our Angle

Browser-layer = framework-agnostic

🟢

MVP Achievable

Chrome extension ships fast

Agent Seatbelt

The seatbelt you install before giving AI the keys.

🔗 Supports These Pitches

Fat StartupAWS of AI WorkControl Plane

Part of the human oversight layer that makes agent work reliable.

01
V6: ClawView

Datadog for
Autonomous Agents

When your AI employee sends the wrong email at 3am, you'll know exactly why.

The Problem

Companies are deploying autonomous AI agents that run 24/7. When something goes wrong—and it will—they have no idea why. Current tools are built for request-response, not proactive agents.

02

Current Tools Miss Autonomous Agents

LangSmith / Langfuse / Arize

Request-response patterns

User sends message, LLM responds

Chain tracing

LangChain-specific, not agent-native

No proactive agent support

Built for chatbots, not employees

ClawView

Autonomous operation

24/7 agents taking proactive actions

Decision tracing

Why did it make that choice?

Multi-channel + tools

Shell, browser, files, messages

03

The "Oh Shit" Demo

🤖
Agent receives task
🧠
Makes decisions
💥
Something goes wrong
🔍
ClawView shows why

Without ClawView

"The agent sent the wrong email. Logs show it ran. No idea why."

With ClawView

"Step 3: Agent assumed X because of context Y. Here's how to prevent this class of error."

ClawView: See What Your Agents Actually Do

Every decision. Every action. Every assumption. Full causal tracing.

🔗 Supports These Pitches

Fat StartupAWS of AI WorkControl Plane

Observability layer — see what agents are doing before they go wrong.

⚠️ Why This is a Feature, Not a Company

Langfuse, LangSmith, Arize are well-funded. But none are built for autonomous agents. ClawView is our internal observability layer, not a separate product pitch.

01
V7: AgentGov

Governance for
AI Employees

Audit trails. Approval workflows. Compliance automation. The control layer enterprises need.

84%
No safety boundaries
0
Audit trails today
EU AI Act
Compliance required
2026
Enforcement begins
02

The Governance Gap

AI agents fail not from bad models, but from bad guardrails. The unlock isn't better agents—it's better safety rails.

— Industry consensus, 2026

What's Missing

No audit trails

What did the agent do at 3am?

No approval workflows

High-stakes actions go unsupervised

No compliance framework

EU AI Act enforcement coming

No agent-on-agent supervision

Humans can't supervise at machine speed

AgentGov Provides

Immutable audit trails

Every action, every decision, timestamped

Approval workflows

Human gates for high-stakes actions

Compliance automation

EU AI Act ready, audit reports generated

AI supervision layer

Validator agents checking worker agents

03

From "Human in Loop" to "Human on Loop"

👤
Human IN Loop
Approve every action
👁️
Human ON Loop
Exception handling
🏛️
Human ABOVE Loop
Strategic oversight

McKinsey Insight

"Organizations are moving from human in the loop to human on the loop—above the loop for strategic oversight." AgentGov enables this transition safely.

AgentGov: Govern AI at Scale

Audit trails. Approval workflows. Compliance automation. Trust at machine speed.

🔗 Supports These Pitches

Fat StartupAWS of AI WorkControl Plane

Governance + compliance layer — enables enterprise trust.

🔬 Key Research

Gravitee 2026: Only 14.4% have full security approval for agents. 88% reported incidents.
EU AI Act: Enforcement begins 2026, mandates audit trails.
Zenity: $38M Series B validates market (but they're low-code focused, not agent-native).

01
V8: AI Employee OS

The Full Stack for
AI Employees

10 layers an AI employee needs to fulfill an entire job description. We're building the unified platform.

The Thesis

An AI employee's value lies in performing EVERYTHING in a job description—not just one workflow. This requires a complete infrastructure stack.

02

The 10-Layer Stack

1Memory & Personality
2Skills & Capabilities
3Tools & Integrations
4Identity & Access
5Objectives & Goals
6Task Management
7Work Artifacts & KB
8Supervision & Oversight
9Communication (A2A)
10QA & Compliance

What's Missing (⭐)

Layers 8-10 are the critical gaps. Everyone's building capabilities. Nobody's building supervision, agent-to-agent communication, and compliance.

03

The Integration Problem

Current landscape is fragmented

Today: Point Solutions

📦

Memory: Mem0, Zep, LangMem

📦

Tools: MCP servers

📦

Identity: Okta, 1Password

📦

Tasks: LangGraph, CrewAI

📦

Compliance: Guardrails AI, Trail

Tomorrow: AI Employee OS

A unified platform that manages the full AI employee lifecycle.

Integrated stack

All 10 layers, one platform

Turnkey deployment

Job description → Working AI employee

Enterprise governance

Built-in compliance, audit, oversight

AI Employee OS

The unified platform for deploying, managing, and governing AI employees.

🔗 Framework For These Pitches

Fat StartupAWS of AI WorkControl Plane

The 10-layer framework is how we think about what AI employees need.

⚠️ Why This is a Framework, Not a Pitch

Building all 10 layers is massive. We focus on Layers 8-10 (supervision, communication, compliance) because that's the critical gap. The framework informs strategy, not the pitch itself.

01
V9: AgentDocs

Stack Overflow
for AI Agents

Verified working code. Real benchmarks. Pay-per-snippet micropayments. Documentation that actually works.

68%
AI code with errors
Hallucinated APIs
$43M+
x402 volume (ready)
0
Competitors with verification
02

The Hallucination Tax

Despite our best efforts, they will always hallucinate. That will never go away.

— Amr Awadallah, Vectara CEO, 2026

❌ The Problem

Best-documented ≠ Best solution

Agents pick whatever has most examples

Documentation gets stale

APIs change, snippets break

No verification

Agent can't know if code actually runs

No benchmarks

No cost/perf data to guide decisions

✓ AgentDocs

Agent-swarm verified

Code tested continuously, timestamped

Use-case organized

"Transcribe video" → 10 services compared

Real benchmarks

Cost, latency, quality scores

x402 micropayments

$0.05 per verified snippet

03

How It Works

🤖
Agent needs code
"Send email via API"
🔍
Query AgentDocs
Structured API
💳
HTTP 402
Pay $0.05 via x402
Verified snippet
Tested 2 hours ago

Kill the API Key

No signup. No rate limits. No accounts. Agent pays per-request, gets verified code. Native to how agents want to consume services.

Initial Use Cases

🎙️

Transcription

Deepgram, Whisper, AssemblyAI

📧

Email

Resend, SendGrid, Mailgun, CF

🖼️

Image Generation

DALL-E, Midjourney API, Flux

💳

Payments

Stripe, LemonSqueezy, Paddle

What Agents Get

Working code snippet

Last verified timestamp

Cost per API call

Latency benchmarks

04

Market & Competition

Closest Competitor: Context7

🟡

Up-to-date docs

✓ They have this

🟡

Version-specific

✓ They have this

Verified working

No continuous testing

Benchmarks

No cost/perf data

Micropayments

Free only, no agent-native billing

Why Now

💰

x402 is production-ready

$43M+ processed, 35M+ txns

🤖

Agent adoption exploding

OpenClaw: 9K→60K stars

📈

$50B market by 2030

AI agent infrastructure

🎯

Clear wedge

Verification is table stakes soon

The x402 Thesis

25,000+ developers building on x402. Google, Cloudflare, Stripe adopting. Machine-to-machine payments are the rails for agent economy.

05

x402 Market Opportunity

Real-time data from x402scan.com shows a booming agent economy — with a clear gap for developer tooling.

$2.21M
30-Day x402 Volume
x402scan.com, Feb 2026
4.2M
Transactions (30 days)
~140K/day average
8,559
Active Buyer Agents
Coinbase facilitator alone
0
Verified Snippet Services
Gap in the market

All 14 Facilitators

Facilitator 30d Txns 30d Vol What They Do
Dexter 1.65M $79.5K Agent economy platform
Coinbase 722K $288.5K Official CDP facilitator
Virtuals Protocol 412K $1.34M AI agent tokenization
PayAI 1.31M $43.3K Micropayments
RelAI 66K $84K Agent payments (Solana)
Meridian 19K $315K High-value transactions
Thirdweb ~10K ~$2K Web3 dev platform
OpenX402 6.6K $38.6K Open-source facilitator
Polymer 6.4K $770 Proof generation
AnySpend ~3K ~$5K Multi-asset spending

+ Corbits, OpenFacilitator, CustomPay, AgentPay (emerging)

Source: x402scan.com, Feb 27 2026

Market Gap Analysis

🔍

What Exists

Data APIs, AI services, crypto tools, social data

What's Missing

Verified code snippets, curated docs, developer knowledge

💡

AgentDocs Opportunity

Be the Stack Overflow layer on x402 rails

Why We Can Win

Top services (StableEnrich, LowPaymentFee) aggregate APIs — they don't verify code quality.
AgentDocs: Premium pricing ($0.05-0.10) justified by verification + benchmarks.
Target: 1,000+ requests/day = $2,100+/month revenue from agent micropayments alone.

06

Revenue Model

AgentDocs: Documentation That Works

Verified snippets. Real benchmarks. Agent-native payments. Stack Overflow, but for machines.

🔗 Supports These Pitches

Fat StartupAWS of AI Work

Better documentation → better agent outputs → more reliable outcomes.

📍 Current Progress

Live: agentdocs-api.holly-3f6.workers.dev
Snippets: 15 use cases, 21 verified snippets
Status: Dogfooding internally, expanding library

01 / 12
AWS OF AI WORK

The Infrastructure Layer
for AI Agent Work

$30-40B poured into AI agents. 95% fail to deliver. We're building the missing infrastructure that makes them actually work.

$50B+
AI Agent Market by 2030
MarketsandMarkets, Grand View Research
95%
Enterprise AI Pilots Fail
MIT NANDA Study, 2025
$4K
MRR (Live)
171%
ROI When It Works
MIT NANDA
02 / 12

The $30B Problem

Companies are pouring billions into AI agents. Almost none deliver measurable returns.

95%
AI pilots deliver zero measurable return
MIT NANDA Study
80%
AI projects fail (2x normal IT)
RAND Corporation
46%
PoCs scrapped before production
WorkOS Research
70-80%
AI SDR churn within months
11x, Artisan data

Companies are pouring $30–40 billion into generative AI, yet an MIT study finds that 95% of enterprise pilots deliver zero measurable return.

— MIT NANDA: The GenAI Divide, 2025

03 / 12

Why AI Agents Fail

The pattern is consistent. It's not the models—it's the infrastructure.

❌ What Breaks

1

No workflow templates

Teams reinvent every agent from scratch. Same failures, different companies.

2

No human oversight

Agents run unsupervised. High-stakes errors go uncaught. Trust collapses.

3

No failure patterns

Each company learns the same lessons. No accumulated knowledge.

4

No orchestration

Multi-agent systems collapse. Stanford CooperBench: 25% success rate.

✓ What's Missing: Infrastructure

Battle-tested workflow templates

Proven prompts, integrations, and sequences. Encoded from real deployments.

Human-in-the-loop routing

Smart escalation. Approval queues. Humans handle edge cases.

Failure pattern library

What breaks and how to prevent it. Compound learning across clients.

Agent orchestration layer

Coordinate multi-agent work. Handle failures gracefully.

The Unlock

The 5% that succeed have infrastructure. Templates. Oversight. Failure patterns. We're building that infrastructure as a service.

04 / 12

The Playbook: Services → Platform

The most valuable infrastructure companies started by doing the work themselves.

Scale AI

Data Labeling → AI Infrastructure

Started labeling images for self-driving cars (2016). Now the "Data Foundry" powering OpenAI, Meta, Google. 50% gross margins from tech-enabled services.

$29B
Valuation (Meta investment, 2025)
Sacra, TechCrunch

Pilot

Bookkeeping Services → Financial Infra

"AWS for SMB accounting." Started doing bookkeeping. Now processes $3B+ in transactions. Jeff Bezos led funding.

$1.2B
Valuation (2021)
CNBC, TechCrunch

Stripe

Payments API → Financial Infrastructure

Started with simple payment processing (2010). Expanded to Connect, Radar, Atlas. Infrastructure that grows as customers grow.

$107B
Valuation (2024)
Wikipedia, Sacra

The Pattern

Do the work → Encode the patterns → Become the platform. Services fund the R&D. Each engagement builds the moat. Competitors starting later start from zero.

05 / 12

Scale AI: The Detailed Parallel

Their journey is our playbook. Same model, different layer.

Scale AI's Model

1

Services Entry

Started labeling images for AV companies. Revenue from day one.

2

Tech Layer

Built pre-labeling ML that made each human 10x more efficient.

3

Data Flywheel

Each correction improved their models. More data = better automation.

4

Platform Expansion

Nucleus, Validate, Launch—from labeling to full ML lifecycle.

Our Model

1

Services Entry

Operating AI agent workflows for clients. Revenue from day one.

2

Tech Layer

Workflow templates + orchestration that make agents reliable.

3

Playbook Flywheel

Each engagement encodes learnings. More workflows = better templates.

4

Platform Expansion

Guardrails, Observability, Governance—full agent lifecycle.

Scale AI is not a traditional BPO company. It is a Data Foundry. Their technology layer is their moat—human workforce augmented by proprietary software that compounds in value.

— Takafumi Endo, "Scale AI: Deconstructing the Foundry"

06 / 12

The Workflow Template Moat

Each engagement encodes a playbook. Playbooks become the platform.

🔧
Verified Prompts
By use case + vertical
+
🔗
Integration Patterns
What connects to what
+
🚫
Failure Patterns
What breaks + fixes
+
👤
Human Routing
When to escalate
📦
Workflow Template Library
Deploy new client in hours, not weeks

Compounding Effect

1️⃣

Customer 1: 2 weeks

Figure everything out from scratch

5️⃣

Customer 5: 3 days

Apply existing playbook + customize

🔟

Customer 10: Hours

Playbook is battle-tested

📦

Customer 50+: Self-serve

Playbooks become product

What's In A Template

📝

Prompt sequences

What actually works for each use case

⚙️

Model routing

Which models for which tasks (cost/quality)

🔗

Tool configurations

Integrations, APIs, credentials patterns

🛡️

Guardrail rules

What to block, what to escalate

07 / 12

Why Infrastructure Wins

Application companies fight for customers. Infrastructure companies power the ecosystem.

❌ Application Layer

📊

Compete on features

Race to the bottom. Easy to copy.

🔄

Linear growth

Each customer = new acquisition cost

💰

2-5x revenue multiples

Commodity software pricing

🏃

Low switching costs

Customers can leave anytime

✓ Infrastructure Layer

🏗️

Compete on reliability

Mission-critical. Hard to replicate.

📈

Compound growth

Templates improve → more value → more customers

💎

10-25x revenue multiples

Scale AI: 18x. Stripe: higher.

🔒

High switching costs

Workflows built on your templates

Network effects are the underlying principle behind the success of companies like AWS, Stripe, and Salesforce. Higher network density means the product value increases.

— NFX: The Network Effects Manual

08 / 12

Market Size: $50-70B by 2030

AI agents are the fastest-growing category in enterprise software. We're building the infrastructure layer.

$7.8B
AI Agents Market (2025)
MarketsandMarkets
$52.6B
AI Agents Market (2030)
MarketsandMarkets
46.3%
CAGR Growth Rate
2025-2030 forecast
$183B
Bullish Forecast (2033)
Grand View Research

Our TAM Slice: Infrastructure

If AI Agents are $50B, infrastructure is 20-30% of stack value:

$10-15B
Agent Infrastructure TAM by 2030

Why We Win This Slice

🎯

First-mover on playbooks

Every month = more encoded knowledge

💰

Revenue while building

Services fund the platform

🧠

Real deployment data

Failure patterns competitors don't have

09 / 12

The Infrastructure Stack

Four layers that make AI agents reliable. We're building all four.

1
Workflow Templates
Verified prompts, sequences, integrations
2
Agent Orchestration
Multi-agent coordination, task routing
3
Human Oversight
Approval queues, escalation, feedback loops
4
Guardrails + Observability
Safety rails, monitoring, audit trails

Current Products

🛡️

Agent Seatbelt

Browser-layer guardrails that block irreversible actions

📊

ClawView

Observability for autonomous agents. See what they do.

🏛️

AgentGov

Governance, compliance, audit trails

📚

AgentDocs

Verified code snippets for agent tool use

10 / 12

Current Traction

$4K
MRR
5
Paying Clients
3
Workflow Types
SDR, Video Gen, Research
+$2K
Added This Week

What We've Delivered

🏗️

SDR for construction companies

Lead gen + qualification workflows

🎬

Video generation for ML training

Synthetic data pipeline workflows

🔬

Research for universities

Literature review + synthesis workflows

🚀

BDR for startups

Outbound + meeting booking workflows

What This Proves

Fat Startup Thesis

We're getting paid to build our moat. Every dollar of revenue = more encoded knowledge. Competitors starting later start from zero.

"A fat startup ships outcomes, not features. It bundles software, data, and human ops into one integrated product that actually gets the job done."

— Andrew Lee, a16z Speedrun

11 / 12

The Path Forward

🛠️
Year 1: Services
$1M ARR · 50+ playbooks
📚
Year 2: Productize
Self-serve templates
🏗️
Year 3: Platform
Others build on us

12-Month Milestones

💰

$1M ARR

Prove unit economics at scale

📚

50+ Workflow Templates

Across 5+ verticals

🔧

Infrastructure Products Live

Guardrails, Observability, Governance

📦

First Self-Serve Templates

Deploy without our team

Why Now

🚀

Models just got capable enough

GPT-5, Claude 4—agents can work

💔

AI SDR market burned

70-80% churn = customers seeking alternatives

Infrastructure window open

No dominant player yet. First-mover wins.

📜

Regulatory tailwinds

EU AI Act mandates oversight, audit trails

12 / 12

The Ask

The AWS of AI Work

Infrastructure that makes AI agents reliable. Workflow templates. Orchestration. Human oversight.

Every company deploying agents will need this. We're building it.

Team

Keith Schacht

Co-Founder + Advisor

$140M exit (Mystery) · a16z funded

[CTO]

Co-Founder & CTO

CTO of Because · $3M Seed

Yasir

Co-Founder

yapthis.com · Shipped production agents

Key Sources

MIT NANDA Study: 95% AI failure rate, 171% ROI when successful

MarketsandMarkets: $7.8B → $52.6B AI agents market (2025-2030)

Scale AI (Sacra): $1.5B ARR, $29B valuation, 50% gross margins

Pilot (CNBC/TechCrunch): $1.2B valuation, Bezos-backed

11x/Artisan: 70-80% churn within months (Broadn research)

RAND Corporation: 80% AI project failure rate

01 / 12
MARKETPLACE THESIS

The Uber for AI Work

Post an outcome. AI agents compete. Pay only for results. We're building the outcome marketplace for the AI economy.

$4K
MRR Today
70%
Network effects create tech value
NFX Research
$13.8B
Scale AI valuation
Services → Platform
$60M+
GitCoin distributed
Bounty model works
02 / 12

The a16z Speedrun Thesis

This is the exact model a16z partners are calling for in 2026.

Say you need 50 qualified sales meetings. Instead of buying another AI tool, you post a bounty: "$500 per meeting booked." AI agents compete. Whoever performs best gets paid. We already do this with bug bounties, Kaggle, hackathons. Why not for AI agents going after real business outcomes?

— Macy Mills, a16z Speedrun, "14 Big Ideas for 2026"

I'm especially excited about products that use AI to make previously expensive services cheaper and more accessible, sometimes using human-in-the-loop to start.

— Kenan Saleh, a16z Speedrun, "14 Big Ideas for 2026"

A fat startup ships outcomes, not features. It bundles software, data, and human ops into one integrated product that actually gets the job done.

— Andrew Lee, a16z Speedrun Partner

03 / 12

The Market Shift: Tools → Outcomes

The freelance marketplace is $1.5T. It's about to be disrupted by AI agents.

❌ Legacy Marketplaces

📝

Upwork: $1.67B market cap

Pay humans by the hour. Hope they deliver.

📝

Fiverr: ~$1B market cap

Fixed-price gigs. Still human-dependent.

🐢

Slow, expensive, variable

Wait days. Pay premium. Quality varies.

✓ AI Agent Marketplace (Us)

🎯

Pay per outcome, not effort

$X per meeting, $Y per video, $Z per lead.

Hours, not days

AI agents work 24/7. Instant scale.

📈

Network effects compound

More agents = better matching = better outcomes.

The Paradigm Shift

As we move to a future based on outcome-based pricing that perfectly aligns incentives between vendors and users, we'll first move away from time-based billing. — a16z Big Ideas 2026

04 / 12

How It Works

Bounties + Escrow + AI Agents = Outcome Marketplace

🎯
Post Bounty
"50 meetings @ $500 each"
💰
Escrow Funds
Payment locked
🤖
Agents Compete
Best performers win
Verify & Release
QA passes → pay out

For Buyers

📝

Define the outcome

"Book qualified meeting" or "Generate product video"

💵

Set your price

Pay what the outcome is worth to you

🔒

Zero risk

Funds held in escrow. Pay only on delivery.

For Agents (Supply Side)

🎰

Pick bounties that fit

Match capabilities to opportunities

📊

Build reputation

Success rate → more bounties → more revenue

💰

Get paid instantly

Verified outcome → automatic payout

05 / 12

The Bounty Model Works

Proven in bug bounties, open source, and ML competitions. Now it's time for AI work.

$60M+
GitCoin distributed
Open source bounties
$100M+
Bug bounties/year
HackerOne + Bugcrowd
$1B+
Kaggle prize pool
ML competitions
10M+
Replit users
Bounties marketplace

Precedent: Replit Bounties

Imagine a tool where you describe your problem and get a solution built for you. Today we're introducing Bounties, a marketplace where you work with top creators and bring your software ideas to life.

— Replit, on launching Bounties

Replit proved bounties work for code. We're proving it works for any AI-deliverable outcome.

Precedent: GitCoin

Over the past 5 years we've supported the funding of public goods. Started with bounties for open source, evolved to quadratic funding.

— GitCoin: $60M+ distributed

GitCoin proved bounties + crypto payments = massive coordination. We're applying this to AI agent work.

06 / 12

Network Effects: The Moat

70% of tech value comes from network effects. Here's how we build them.

Network effects have been responsible for 70% of all the value created in technology since 1994. Founders who deeply understand how they work will be better positioned to build category-defining companies.

— NFX, "The Network Effects Bible"

Two-Sided Marketplace NFX

👤

More buyers → More bounties

Attracts more agents to the platform

🤖

More agents → Better matching

Faster delivery, higher quality outcomes

📈

Better outcomes → More buyers

Word of mouth, lower prices, faster delivery

Data Network Effects

📊

Every bounty = training data

What works, what fails, edge cases

🧠

Smarter matching over time

Route bounties to best-fit agents

🔒

Proprietary playbook library

Compound knowledge competitors can't replicate

Metcalfe's Law

Value of a network grows proportional to N² (nodes squared). With agents AND buyers, we get cross-side network effects that compound faster than single-sided platforms.

07 / 12

Trust Layer: How Agents Build Reputation

The missing infrastructure for AI agent marketplaces.

Agent Identity & Track Record

🆔

Verifiable agent identity

Who built it, what it can do, audit trail

📈

Per-function reputation

Track record based on actual outcomes, not reviews

🏆

Specialization scores

"This agent is 94% on sales meetings, 78% on video"

Trust Mechanics

🔒

Escrow with time-locks

Funds released only on verified delivery

⚖️

Dispute resolution

Human or AI arbitration for edge cases

📉

Sliding refund scale

Partial credit for partial delivery

🆕
New Agent
Low trust, small bounties
📊
Track Record
Outcomes verified
Trusted Agent
High-value bounties
🏅
Elite Status
Premium rates, priority
08 / 12

Path: Managed → Open Marketplace

Like Uber: start premium, then open the platform.

🛠️
Phase 1: Now
We run the agents
🤝
Phase 2: Partners
Vetted agent builders
🌐
Phase 3: Open
Any agent can join

Phase 1: Managed (Now)

We operate all agents

Quality control, learn playbooks

$4K MRR validates demand

Customers paying for outcomes

Build trust infrastructure

Escrow, verification, reputation

Phase 2-3: Marketplace

🔜

Invite partner agents

Vetted builders, revenue share

🔜

Open to all agents

Anyone can compete for bounties

🔜

Platform take rate: 15-20%

Like Uber, Airbnb, marketplace standard

The Uber Playbook

Uber started with black cars (premium, managed) before opening to UberX (open marketplace). We start with our agents, prove economics, then open to all. Services fund the platform build.

09 / 12

Comparable Companies & Valuations

Services → Platform is a proven path to massive outcomes.

$13.8B
Scale AI
Data labeling services → platform
$1.67B
Upwork
Freelance marketplace (ripe for disruption)
$1.2B
Pilot
Bookkeeping: humans + AI
$50B+
Palantir
Services → Platform → Public

Scale AI: Our North Star

1️⃣

Started as services

Data labeling for ML companies

2️⃣

Built the platform

Tools, workflows, quality systems

3️⃣

$2B+ revenue (2025)

Services funded the infrastructure

4️⃣

$13.8B valuation

Platform economics, not services multiples

Why We're Bigger

📊

Scale AI: One vertical

Data labeling for ML

🌐

Us: All AI-deliverable work

Sales, content, research, ops...

📈

TAM: $1.5T+ services market

Every white-collar task that can be AI'd

10 / 12

Why Now: The Perfect Storm

GPT-5
Agents now capable
x402
Machine payments ready
a16z Big Ideas 2026
70%
AI SDR churn
Tools failing, outcomes wanted
$1B+
AI coding revenue (2025)
a16z: Agent apps thriving

Technology Inflection

🧠

Models capable enough

GPT-5, Claude 4 can do real work

💳

x402 machine payments

Agents can transact autonomously

🔧

Infrastructure exists

OpenClaw, MCP, agent frameworks

Market Readiness

💔

AI tools disappointing

70% churn = buyers want outcomes

💰

Budget exists

Companies spending on AI, getting nothing

🏃

First mover advantage

No AI-native outcome marketplace yet

Emerging primitives like x402 make payment settlement programmable and reactive. Smart contracts can settle a dollar payment globally in seconds. In 2026, this becomes the rails for agent commerce.

— a16z Big Ideas 2026, Part 3

11 / 12

Team & Traction

$4K
MRR
5
Customers
3-7x
Margin Multiple
0%
Churn (outcome-aligned)

Keith Schacht

Co-Founder + Advisor

$140M exit (Mystery) · a16z funded · Built consumer products used by millions

[CTO]

Co-Founder & CTO

CTO of Because · $3M Seed · Deep agent infrastructure experience

Yasir

Co-Founder

yapthis.com · Agentic architecture · Shipped production agent systems

What Traction Proves

Companies pay for outcomes. 0% churn because incentives align. This is the business model for AI work.

12 / 12

The Ask

What We Need

💰

$[X] Pre-Seed

Scale agent capacity, build marketplace infra

🎯

12-month goal: $1M ARR

Prove economics before opening marketplace

🌐

24-month: Open marketplace

Partner agents, then fully open

Why Us

🐕

Dog-fooding OpenClaw

We run agents daily, know what breaks

📊

Built the infrastructure

ClawView, guardrails, workflows

💵

Revenue already

$4K MRR proves the model

OpenHolly: The Uber for AI Work

Post an outcome. AI agents compete. Pay for results. The marketplace that makes AI actually deliver.

🔧 Infrastructure We're Building

🛡️ Guardrails📊 ClawView🏛️ AgentGov

Trust layer that makes marketplace outcomes reliable.

📚 Sources

a16z: "14 Big Ideas for 2026" (Macy Mills, Andrew Lee, Kenan Saleh) • "Big Ideas 2026 Part 1-3" • NFX: "The Network Effects Bible" (70% of tech value) • Market Data: Scale AI ($13.8B), Upwork ($1.67B), GitCoin ($60M+ distributed) • Replit: Bounties marketplace launch

1 / 12
CONTROL PLANE THESIS

The Control Plane for
AI Agents

Everyone's building autonomous agents. We're building the layer that makes them actually work: purpose-built infrastructure for human oversight at scale.

95%
AI pilots fail to deliver ROI
MIT Research, 2025
17x
Error amplification in "bag of agents"
DeepMind, Dec 2025
71%
Accuracy improvement with HITL
Microsoft Magentic-UI
$4.5K
MRR proving the thesis
OpenHolly, Feb 2026
2 / 12

The Inconvenient Truth: Autonomy Fails

The research is clear—and the industry is learning the hard way.

Multi-Agent Systems Break Down

"Multi-agent architectures, despite their promise, can fall short on efficiency, reliability, and even accuracy... performance often degrades as coordination complexity increases."

— Berkeley/DeepMind "Why Multi-Agent LLM Systems Fail", 2025

📊

75% failure rate

ChatDev on ProgramDev benchmark

📊

~50% average task completion

Across autonomous agent frameworks

📊

17x error amplification

In uncoordinated "bag of agents"

Enterprise AI Projects Crater

"42% of companies abandoned most of their AI initiatives in 2024, up from 17% the previous year. The average organization scrapped 46% of AI proof-of-concepts."

— S&P Global Research, 2024

📊

95% of AI pilots fail

MIT Research on enterprise deployments

📊

80%+ never reach production

RAND Corporation AI project study

📊

2x failure rate vs traditional IT

AI projects vs standard software

Why This Matters

The industry is betting billions on fully autonomous agents. The research says they don't work. Someone needs to build the layer that makes them work.

3 / 12

Microsoft's Answer: Human-in-the-Loop

The largest AI research org in the world just validated our thesis.

"We argue that human-in-the-loop agentic systems offer a promising path forward, combining human oversight and control with AI efficiency to unlock productivity from imperfect systems."

— Microsoft Research, Magentic-UI (July 2025)

Magentic-UI Results

71%
Accuracy improvement with human-in-loop
30.3% → 51.9% on GAIA benchmark
📊

Only 10% of tasks needed human help

Lightweight intervention, massive improvement

📊

1.1 avg clarifications per help request

Minimal interaction overhead

Key Interaction Mechanisms

🤝

Co-planning

Human + agent collaborate on plan before execution

🔄

Co-tasking

Seamless handoff between human and agent control

🛡️

Action guards

Human approval for high-stakes actions

🧠

Memory

Learn from past interactions to improve

Microsoft's Conclusion

"Even as tomorrow's agents become more capable and reliable, we believe that human involvement will remain essential for preserving human agency, resolving unforeseen ambiguities, and guiding agents in adapting to an ever-changing world."

4 / 12

Anthropic's Findings: The Oversight Paradox

Real-world data from millions of Claude Code sessions reveals how humans actually oversee agents.

As Users Gain Experience...

📈

Auto-approve increases: 20% → 40%+

Experienced users let Claude run autonomously

📈

BUT interrupt rate ALSO increases: 5% → 9%

They intervene more often, not less

💡

The shift: Step-by-step → Exception-based

From approving everything to watching for problems

Agent-Initiated Stops Matter

🤖

Claude asks for clarification 2x more

On complex tasks vs simple ones

🤖

More often than humans interrupt

On the most difficult tasks

💡

Models know when they're uncertain

They can (and should) ask for help

"Effective oversight doesn't require approving every action but being in a position to intervene when it matters... our central conclusion is that effective oversight of agents will require new forms of post-deployment monitoring infrastructure and new human-AI interaction paradigms."

— Anthropic Research, "Measuring AI Agent Autonomy in Practice" (Feb 2026)

The Deployment Overhang

Anthropic found that "the autonomy models are capable of handling exceeds what they exercise in practice." The bottleneck isn't model capability—it's the oversight infrastructure.

5 / 12

Air Traffic Control for AI Agents

The analogy everyone is converging on—and what it means for product design.

"Think of agents within your multi-agent system as the airplanes. The agents have their own autonomy to act. But air traffic control provides guardrails, coordination, and human oversight for the whole system."

— Jason Bryant, AI in Pharma (Jan 2026)

Why Air Traffic Control Works

✈️

Planes are autonomous

Pilots make real-time decisions

🗼

Controllers handle coordination

Routing, conflicts, emergencies

👤

Humans handle edge cases

Technology can't modify standard procedures

🔄

System improves over time

Incidents become new procedures

Why This Analogy Matters

📊

Scaling ratio: 1 controller : many planes

Not 1:1 human-to-agent

🛡️

Controllers can't replace pilots

Nor vice versa—complementary roles

⚠️

No full automation possible

Edge cases require human judgment

💰

Multi-billion dollar industry

ATC isn't going away

The Thesis

As AI agents proliferate, every company will need an "air traffic control" system for their agent fleet. That's the control plane we're building.

6 / 12

Why Current Interfaces Fail

Existing tools weren't designed for the human-agent oversight problem.

❌ Chat Interfaces

Conversational, not workflow-oriented. Can't manage 100 agents. No approval queues. No batch operations. You'd need a chat window per agent.

❌ Code/GitHub

Great for developers. Useless for ops teams. Can't approve actions in real-time. No visual understanding of agent state or intent.

❌ Slack/Email Alerts

Ad hoc approvals. No context. Alert fatigue. Doesn't learn from decisions. Can't see what agent plans to do next.

❌ Observability Dashboards

Read-only visibility. No intervention capability. See problems after they happen. Can't modify agent plans mid-execution.

"Only 14.4% of enterprises have full security approval for AI agents. 88% reported agent-related incidents. The interface problem is also a governance problem."

— Gravitee State of AI Agents Report, 2026

The Gap

There's no purpose-built interface for humans to oversee AI agents at scale. Not dashboards. Not chat. Not alerts. A new category needs to exist.

7 / 12

What a Control Plane Actually Needs

Distilled from Microsoft, Anthropic research, and our own deployments.

Pre-Execution

📋

Plan Review

See what agent intends to do before it acts. Edit plans. Add constraints.

🎯

Scope Boundaries

Define allowed domains, tools, actions. Agent can't exceed boundaries.

🔗

Workflow Templates

Start from proven patterns. Don't reinvent for every task.

During Execution

👁️

Real-Time Visibility

See agent actions as they happen. Browser view. Code execution. API calls.

⏸️

Interrupt & Resume

Pause any agent instantly. Take control. Hand back.

🛡️

Action Guards

Automatic pause for high-stakes actions. Configurable thresholds.

Approval Layer

📥

Unified Queue

All pending approvals across all agents in one view.

🎛️

Batch Operations

Approve/reject patterns across many agents at once.

🔀

Smart Routing

Route different decisions to different humans by expertise.

Learning Layer

🧠

Decision Memory

Human approvals become future patterns. Rejections become rules.

📈

Threshold Tuning

Auto-adjust when to ask humans based on outcomes.

📚

Playbook Evolution

Workflows improve with every human intervention.

8 / 12

The "Control Plane" Category

Every complex system has a control plane. AI agents need one too.

🐕
Datadog
$50B+ market cap

Control plane for infrastructure. See what's happening. Alert when things break. Intervene.

☸️
Kubernetes
Industry standard

Control plane for containers. Orchestrate workloads. Handle failures. Scale automatically.

🔐
Okta
$15B+ market cap

Control plane for identity. Who can access what. Audit trails. Compliance.

🎛️
???
AI Agent Control Plane

What agents are doing. Approvals & intervention. Learning & guardrails. This category doesn't exist yet.

"The control plane provides management and orchestration across an organization's environment. It's akin to air traffic control for applications."

— Vectra AI definition

The Opportunity

Infrastructure got Datadog. Containers got Kubernetes. Identity got Okta. AI agents need their control plane. We're building it.

9 / 12

Why Human-in-the-Loop Scales

The VC objection—and why it's wrong.

The Objection

"If humans are in the loop, doesn't that kill unit economics? Isn't the whole point to remove humans?"

The Response: Look at the Data

Scale AI
$13.8B valuation

Human labelers + AI. Humans as oversight.

Pilot
$1.2B valuation

Human bookkeepers + AI. Humans as QA.

Palantir
$50B+ market cap

Human analysts + AI. Humans as strategists.

The Key Distinction

"Humans as OVERSIGHT, not labor. AI does the work, humans QA. The ratio improves over time."

The Scaling Math

1️⃣

Year 1: 10:1 ratio

1 human oversees 10 agents. Heavy QA.

2️⃣

Year 2: 100:1 ratio

System learns. Fewer interventions needed.

3️⃣

Year 3+: 1000:1 ratio

Humans handle edge cases only. Still critical.

The Avi Medical Case Study

81% automation rate. 93% cost savings. Humans handle complex cases. HITL doesn't kill unit economics—it enables them.

10 / 12

The Contrarian Bet

Everyone's zigging toward full autonomy. We're zagging toward control.

What Everyone Else is Building

🤖

Fully autonomous agents

Demo well. Break in production.

🤖

More agent capabilities

Better models. More tools. Same failure modes.

🤖

"Just add more agents"

17x error amplification, per DeepMind.

🤖

Removing humans entirely

The dream that keeps failing.

What We're Building

🎛️

The oversight layer

Makes ANY agent more reliable.

🎛️

Human-agent collaboration

Complementary strengths. Better outcomes.

🎛️

Coordination infrastructure

Turns bag-of-agents into functional team.

🎛️

Humans in the right places

Exception handling. Strategic oversight.

"I'm especially excited about products that use AI to make previously expensive services cheaper and more accessible, sometimes using human-in-the-loop to start."

— Keenan Saleh, a16z Speedrun Partner

Our Position

We're not betting against agent capabilities improving. We're betting that oversight infrastructure will always be needed—and no one is building it well.

11 / 12

Why Us, Why Now

Team

Keith Schacht

Co-Founder + Advisor

$140M exit (Mystery) · a16z funded · Built consumer products used by millions

[CTO]

Co-Founder & CTO

CTO of Because · $3M Seed · Deep agent infrastructure experience

Yasir

Co-Founder

yapthis.com · Agentic architecture · Shipped production agent systems

Why Now

📈

Agent adoption is exploding

OpenAI Operator, Anthropic Claude Code, 1000+ agent startups

💔

Failure rates are becoming visible

95% pilot failure is now common knowledge

📄

Research is converging

Microsoft, Anthropic, DeepMind all pointing to HITL

🏛️

Regulation is coming

EU AI Act mandates audit trails & oversight

What We've Built

$4.5K MRR

Proving the thesis with real customers

OpenClaw infrastructure

Dogfooding our own control plane daily

Guardrails, ClawView, AgentGov

Components of the full control plane

12 / 12

The Ask

The Human-Agent Control Plane

Purpose-built infrastructure for human oversight of AI agents at scale. Plan review. Action guards. Approval queues. Learning loops. The missing layer that makes agents actually work.

What We Need

💰

$[X] Pre-Seed

Build the full control plane product

🎯

12-month goal: $1M ARR

Prove control plane scales across customers

📚

Then: Category definition

Be "Datadog for AI agents"

The Opportunity

📈

New category creation

No one owns "AI agent control plane" yet

📈

Research-backed thesis

Microsoft, Anthropic, DeepMind alignment

📈

Every agent deployment needs this

Horizontal opportunity across industries

🔧 Infrastructure We're Building

🛡️ Guardrails📊 ClawView🏛️ AgentGov🤖 Employee OS

The Control Plane integrates all infrastructure layers into one human-facing interface.

🔬 Research Foundation

MIT: 95% of AI pilots fail · DeepMind: 17x error amplification in multi-agent · Microsoft Magentic-UI: 71% accuracy improvement with HITL · Anthropic: "New oversight infrastructure needed" · Berkeley: "Why Do Multi-Agent Systems Fail?" · S&P Global: 42% of AI initiatives abandoned

1 / 12
VIBE CODING OUTCOMES

Vibe Code Your Business

"Vibe coding" revolutionized app development—describe what you want, AI builds it. Now apply this to business outcomes. Describe the result, AI + humans deliver it.

Feb 2025
Karpathy coins "vibe coding"
X/Twitter
2026
"Vibe productivity" emerges
Beyond just coding
71%
Accuracy boost with HITL
Microsoft Magentic-UI
$4K
MRR proving the thesis
2 / 12

The Vibe Coding Revolution

What started as a meme became a paradigm shift. Now it's evolving beyond code.

"There's a new kind of coding I call 'vibe coding', where you fully give in to the vibes, embrace exponentials, and forget that the code even exists."

— Andrej Karpathy, Feb 2025 (coined the term)

Origins & Evolution

2023

"The hottest new programming language is English"

Karpathy's early prediction about LLM capabilities

2025

Vibe coding goes mainstream

Cursor, Replit, Claude Code—describe → build

2026

Beyond coding: "Vibe Productivity"

Research, writing, reporting, file operations, "glue work"

Where It's Going

"What changed in early 2026 is that vibe coding is no longer confined to software development; it is spreading into research, writing, reporting, spreadsheet wrangling, file operations, and 'glue work' that usually fragments attention."

— Ken Huang, "The Vibe Shift" (Jan 2026)

The Pattern

Vibe coding showed that natural language → complex software works. Now we're applying the same pattern to natural language → business outcomes.

3 / 12

From Apps to Outcomes

The next evolution: describe what you want to achieve, not what you want built.

💻
Vibe Coding
"Build me an app that..."
Vibe Outcomes
"Get me 50 sales meetings"
🎯
Result
Meetings on your calendar

❌ Current Reality: Use Tools

1

Subscribe to AI SDR tool

$5-10K/month

2

Configure the tool

Import lists, write sequences, set rules

3

Monitor the tool

Fix errors, adjust settings, babysit

4

Hope for outcomes

70% churn in 3 months when it doesn't work

✓ Vibe Outcomes: Describe Results

1

Describe what you want

"50 qualified sales meetings with Series A fintech founders"

2

AI agents execute

Research, outreach, qualification, scheduling

3

Humans QA

Review, approve, handle edge cases

4

Pay for outcomes

$X per meeting delivered

The Thesis

Vibe coding proved that intent → artifact works for software. Vibe outcomes proves it works for business results. The "vibes" are the goal—the execution is handled by well-orchestrated HITL agent workflows.

4 / 12

How It Works

Describe outcome → Agents execute → Humans QA → Outcome delivered

💬
Natural Language
"I need..."
📋
Workflow Generation
Map to playbook
🤖
Agent Execution
Multi-agent work
👤
Human QA
Review & approve
Outcome
Delivered

Example: "50 Sales Meetings"

1

Input

"Book 50 qualified meetings with Series A fintech founders in Q1"

2

Research Agent

Identifies prospects, signals, contact info

3

Outreach Agent

Drafts personalized messages

4

Human Review

Approves messaging before send

5

Scheduling Agent

Books the meeting when prospect replies

Example: "Process These Invoices"

1

Input

"Process this month's invoices and flag anomalies"

2

Extraction Agent

Pulls data from PDFs, emails, systems

3

Matching Agent

Matches to POs, identifies discrepancies

4

Human Review

Approves exceptions, flags fraud

5

Output

Processed invoices, exception report

5 / 12

Why Vibe Outcomes Need Human-in-the-Loop

Pure AI can't deliver reliable business outcomes. The research is clear.

95%
AI pilots fail to deliver ROI
MIT NANDA Study
30%
Lower success when agents collaborate
CooperBench, 2026
38.9%
Cite accuracy as #1 AI challenge
Industry analysts, 2025
71%
Accuracy improvement with HITL
Microsoft Magentic-UI

Why Pure AI Fails

"Multi-agent architectures, despite their promise, can fall short on efficiency, reliability, and even accuracy... performance often degrades as coordination complexity increases."

— Berkeley/DeepMind, 2025

⚠️

Hallucinations occur even with high confidence

AI can be confidently wrong about business-critical decisions

⚠️

Edge cases are infinite

Business has nuance AI can't anticipate

⚠️

Stakes are high

Brand damage, legal liability, lost deals

Why HITL Fixes It

"Hybrid AI workflows, which combine automation with human oversight, are not a fallback; they're the modern standard for reliability, trust, and scalability in 2026."

— Parseur, Dec 2025

Human as QA layer, not labor

AI does 90% of work, humans verify critical decisions

Trust calibration over time

System learns when to ask, when to proceed

Only 10% of tasks need human help

Microsoft found lightweight intervention = massive improvement

6 / 12

The Interaction Layer

This is the UX for the AI-native agency, control plane, and marketplace pitches.

Why Current Interfaces Fail

❌ Chat Interfaces

Conversational, not outcome-oriented. Can't manage complex multi-step workflows. No approval queues.

❌ Dashboards

Read-only visibility. No intervention. See problems after they happen. Can't modify plans mid-execution.

❌ Slack/Email Alerts

Ad hoc. No context. Alert fatigue. Can't see what agent plans to do next.

The Vibe Outcomes Interface

💬

Natural language input

"I need X" → system figures out how

📋

Progress visibility

See what's happening toward your goal

🎛️

Approval queues

Review decisions that matter

⏸️

Interrupt & adjust

Course-correct mid-execution

📊

Outcome tracking

Clear metrics: delivered vs requested

🔗 This Powers Our Other Pitches

⚡ Fat Startup: Vibe outcomes is how customers interact with us
🚗 Uber for AI Work: Natural language bounty posting
🎛️ Control Plane: The human oversight layer
☁️ AWS of AI Work: Workflow templates activated by intent

7 / 12

Market Opportunity

The shift from "tools" to "outcomes" is creating massive new markets.

$52.6B
AI Agents Market by 2030
MarketsandMarkets
30%+
Enterprise SaaS with outcome-based pricing
Gartner 2025 Projection
61%
CFOs changing how they evaluate AI ROI
Industry Survey, 2025
$1.5T
Global professional services (TAM)
Work that can be "vibe coded"

Who Wants This

🏢

SMBs frustrated with AI tools

70% AI SDR churn = customers seeking alternatives

🏢

Enterprises with AI fatigue

95% pilot failure = demand for what works

🏢

Founders too busy to manage AI

Want outcomes, not another tool to learn

The Pricing Shift

"Per-seat is no longer the atomic unit of software. When AI can handle ticket resolution, the natural pricing metric becomes successful outcomes."

— a16z Enterprise Newsletter, Dec 2024

💰

Outcome-aligned pricing

$X per meeting, $Y per processed invoice, $Z per video

8 / 12

Competitive Landscape

Who else is thinking about natural language → outcomes?

Tools (Not Outcomes)

AI SDRs (11x, Artisan, AiSDR)

Sell tools. Charge per seat. You manage agents. 70% churn.

❌ Not outcome-based

Agent Platforms (LangChain, CrewAI)

Infrastructure for developers. Build your own workflows.

❌ Not outcomes, just primitives

Automation (Zapier, Make)

Workflow automation. You design the flows.

❌ Not AI-native, not outcome-based

Closest Parallels

Scale AI ($13.8B)

Services + HITL → platform. "We need labeled data" → delivered.

✓ Outcome-based, HITL model

Pilot ($1.2B)

"Do my bookkeeping" → done. Humans + AI.

✓ Outcome-based, HITL model

Intercom Fin ($0.99/resolution)

AI support priced per successful outcome.

✓ Outcome-based pricing model

Our Differentiation

Horizontal, not vertical. Scale AI = data labeling. Pilot = bookkeeping. We're building the general-purpose vibe outcomes platform—natural language to any deliverable business result.

9 / 12

Current Traction

Proving the thesis with real customers and real outcomes.

$4K
MRR
5
Customers
0%
Churn
3
Outcome Types

Outcomes We've Delivered

📅

"Get me sales meetings"

SDR/BDR for construction, startups (50% of revenue)

🎬

"Generate training videos"

ML training data pipelines (30% of revenue)

📚

"Research these topics"

University lab literature synthesis (20% of revenue)

Why Zero Churn

"When you only pay for outcomes, there's no reason to churn. We deliver meetings, they pay. We don't deliver, they don't pay. Aligned incentives = sticky customers."

vs. AI Tool Churn

AI SDRs charge $5-10K/mo whether or not they work. When they don't deliver, customers leave. Misaligned incentives = 70% churn.

10 / 12

Why Now: 2026 Is the Year

Technology, market, and cultural convergence make this the moment.

Technology Ready

🧠

Models finally capable enough

GPT-5, Claude 4 can execute real business workflows

🔧

Agent infrastructure exists

OpenClaw, MCP, tool-use protocols

💳

x402 machine payments

Agents can transact autonomously (a16z Big Ideas 2026)

📊

HITL research converging

Microsoft, Anthropic, DeepMind all pointing same direction

Market Ready

💔

AI tool fatigue

70% AI SDR churn. 95% pilot failure. Customers want what works.

💰

Budget exists

Companies spending billions on AI, getting nothing

📈

Pricing shift happening

30%+ enterprise SaaS moving to outcome-based

🎯

"Vibe coding" cultural moment

Natural language → results is now understood

"2025 was widely labeled 'the year of AI agents.' In reality, it was the year we learned what agents can and cannot do. 2026 is the year we build systems that work reliably, repeatedly, and in production."

— Human-in-the-Loop Newsletter, Dec 2025

11 / 12

Team

Keith Schacht

Co-Founder + Advisor

$140M exit (Mystery) · a16z funded · Built consumer products used by millions

[CTO]

Co-Founder & CTO

CTO of Because · $3M Seed · Deep agent infrastructure experience

Yasir

Co-Founder

yapthis.com · Agentic architecture · Shipped production agent systems

What We've Built

🐕

Dog-fooding daily

Running OpenClaw infrastructure ourselves

🛡️

Agent Seatbelt

Browser-layer guardrails

📊

ClawView

Agent observability

📚

Workflow templates

Playbooks that compound

Why Us

We've shipped outcomes

$4K MRR from real deliverables

We understand HITL

Built the infrastructure, not just the agents

We know the failure modes

Encoded in playbooks from real experience

12 / 12

The Ask

Vibe Code Your Business

Describe the outcome you want. AI agents + human QA deliver it. Pay only for results. The interaction layer for the AI economy.

What We Need

💰

$[X] Pre-Seed

Scale agent capacity + build the interface

🎯

12-month goal: $1M ARR

Prove vibe outcomes across multiple verticals

📦

Then: Self-serve platform

Anyone can describe outcomes and get them

The Opportunity

📈

New category creation

"Vibe outcomes" platform doesn't exist yet

📈

Cultural moment

Vibe coding is mainstream—extend it to business

📈

$52.6B market by 2030

AI agents + outcome-based pricing converging

📚 Research Foundation

Karpathy: Coined "vibe coding" Feb 2025 · MIT NANDA: 95% AI pilot failure · Microsoft Magentic-UI: 71% accuracy improvement with HITL · CooperBench: 30% lower success in multi-agent without coordination · a16z: Outcome-based pricing shift · Gartner: 30%+ enterprise SaaS with outcome pricing by 2025 · Bessemer: AI Pricing Playbook (Feb 2026)

🔗 Related Pitches

⚡ Fat Startup💰 Outcome-Based🚗 Uber for AI Work🎛️ Control Plane

Vibe Coding Outcomes is the UX/interaction layer that powers all of these.

Research • NYC Target Companies
01

25 NYC Startups: R&D Opportunities

Series A-B companies ($13M-$160M raised) with specific research they could implement but haven't.

25

NYC Tech Startups

$850M+

Combined Funding

75+

Research Opportunities

02

🏦 Fintech / Finance AI

Rogo — $75M (Series B)

Building "Wall Street's first AI analyst" — LLMs for financial reasoning

R&D Opportunities:

  • Chain-of-Table reasoning — 40% more accurate on tabular financial data
  • FinGPT fine-tuning — Open-source financial LLM for domain reasoning
  • Toolformer for financial APIs — Teach LLMs to call Bloomberg/Reuters autonomously

Hook: "Your financial reasoning models could be 40% more accurate on tabular data with Chain-of-Table"

Farsight — $16M (Series A)

AI for finance — valuation models, deal analysis, Excel/PPT generation

R&D Opportunities:

  • SpreadsheetLLM — Microsoft's approach to better spreadsheet understanding
  • DocPrompting — Generate accurate documents with citations
  • Table-GPT — Unified table understanding and generation

Hook: "SpreadsheetLLM could cut your Excel generation errors by 30%"

Aiera — $25M (Series B)

GenAI for financial professionals — broker research, earnings calls, filings

R&D Opportunities:

  • LongLoRA — Process 10x longer earnings calls without quality loss
  • RAG-Fusion — Multiple query generation for better retrieval
  • Time-LLM — Repurpose LLMs for time series forecasting

Hook: "LongLoRA could let you process 10x longer earnings calls without quality loss"

Carbon Arc — $56M (Series A)

Marketplace for curated AI-ready datasets (Insights Exchange)

R&D Opportunities:

  • Data-Juicer — Open-source data quality toolkit for AI datasets
  • DataComp — Benchmark for dataset curation quality
  • Synthetic data detection — Verify dataset authenticity/quality

Hook: "DataComp benchmarking could become your quality certification"

03

🏥 HealthTech / BioTech

Ataraxis — $20M (Series A)

AI for cancer precision medicine — analyzes data to identify optimal treatments

R&D Opportunities:

  • CancerGPT — Few-shot learning for drug pair synergy prediction
  • DrugCLIP — Contrastive learning for drug-target interaction
  • Med-PaLM 2 — Google's medical LLM achieving expert-level performance

Hook: "CancerGPT's few-shot approach could expand your drug combination predictions 5x faster"

Inspiren — $35M (Series A)

AI + IoT for senior care — AUGi device for fall detection and patient monitoring

R&D Opportunities:

  • RT-DETR — Real-time detection faster than YOLO
  • Action recognition transformers — Video transformers for activity recognition
  • Privacy-preserving pose estimation — On-device processing without cloud

Hook: "RT-DETR could cut your fall detection latency by 40% while running entirely on-device"

Slingshot AI — $40M (Series A)

AI for mental health — "Ash" chatbot simulates therapist-like conversations

R&D Opportunities:

  • Constitutional AI for safety — Anthropic's approach to helpful + harmless
  • EmoBERTa — Emotion-aware language model fine-tuning
  • CBT dialogue systems — Structured therapeutic conversation flows

Hook: "Constitutional AI could reduce harmful responses by 80% while maintaining therapeutic value"

Camber — $30M (Series B)

Healthcare payment automation — streamlines insurance reimbursement

R&D Opportunities:

  • Medical coding LLMs — Auto-coding diagnosis/procedure codes
  • Claims denial prediction — ML to predict and prevent rejections
  • Donut/Pix2Struct — Document understanding for medical forms

Hook: "Medical coding LLMs could auto-fill 60% of your claims forms"

04

🛠️ Dev Tools / Infrastructure

Warp — $18M (Series A)

AI-powered payroll platform for multi-state compliance

R&D Opportunities:

  • Regulatory RAG — Retrieval over tax code databases
  • LayoutLMv3 — Extract state tax forms with 95% accuracy
  • Temporal reasoning — LLMs for date/deadline calculations

NetBox Labs — $35M (Series B)

Open-source network automation platform

R&D Opportunities:

  • LLM for network config — Auto-generate Cisco/Juniper configs from NL
  • Anomaly detection — Transformer-based time series for network telemetry
  • Vision → IaC — Convert network diagrams to code

Topline Pro — $27M (Series B)

AI marketing for home service businesses

R&D Opportunities:

  • Local SEO automation — LLM-generated location-specific content
  • Multi-modal review response — Personalized responses with images
  • Conversational scheduling — LLM-powered booking agents
05

💼 Sales / Marketing AI

Clay — $40M (Series B, $1.25B valuation)

AI for sales personalization — integrates 100+ data sources

R&D Opportunities:

  • Persona-based email generation — LLMs that adapt tone per recipient
  • Entity resolution at scale — Deduplication across data sources
  • Buyer intent prediction — Multi-signal ML for ready-to-buy leads

Hook: "Buyer intent prediction could 3x your users' reply rates"

Profound — $35M (Series B) ⭐ Existing Client

AI search optimization — helps brands appear in AI-generated responses

R&D Opportunities:

  • Retrieval optimization — Improve citation likelihood in RAG systems
  • AI visibility benchmarking — Measure brand presence across LLMs
  • Source authority scoring — How LLMs weight different sources

ShopMy — $77.5M (Series B)

Influencer commerce platform

R&D Opportunities:

  • CLIP-based product matching — Visual similar product discovery
  • Influencer-audience fit — ML for matching creators with brands
  • Shoppable video AI — Auto-detect and tag products in video
06

⚖️ Compliance / Legal AI

Norm AI — $48M (Series B)

AI for regulatory compliance — automates review of legal documents

R&D Opportunities:

  • Legal-BERT fine-tuning — Domain-specific transformer for legal text
  • Contract element extraction — NER for legal clauses
  • Regulatory change detection — Track and summarize regulation updates

Hebbia — $130M (Series B, $700M valuation)

Document AI — searches large document sets with citations

R&D Opportunities:

  • ColBERT v2 — Late interaction retrieval for better search
  • Self-RAG — LLM that self-reflects on retrieval quality (+25% accuracy)
  • Structured reasoning chains — Better citation generation

Hook: "Self-RAG could improve your citation accuracy by 25%"

07

🔒 Cybersecurity / 🌱 Climate / 🛒 Consumer

Zip Security — $13.5M

SMB cybersecurity

  • LLM threat intelligence
  • Automated SOC analyst
  • LLM phishing detection (+40% accuracy)

Chestnut Carbon — $160M

Reforestation + carbon credits

  • Satellite carbon estimation
  • Biodiversity monitoring (audio/visual)
  • ML credit verification

GDI — $20M+

Silicon anodes for EV batteries

  • Battery degradation prediction
  • Materials discovery with ML
  • CV defect detection (-40% QC cost)

Novig — $18M

P2P sports betting

  • LLM odds modeling
  • Market making algorithms
  • Fraud detection

David — $75M

High-protein nutrition bars

  • AI food formulation
  • Demand forecasting
  • Consumer preference modeling

Cents — $40M

Laundry/dry-cleaning SaaS

  • Demand forecasting
  • Route optimization
  • Image garment classification
08

🎯 Best Targets by Category

🔥 Highest Urgency (AI-Native)

  • Rogo — Financial reasoning is hard, need every edge
  • Hebbia — Document AI is competitive, Self-RAG matters
  • Aiera — Long context + time series = big opportunities
  • Slingshot AI — Safety is existential for mental health AI

💰 Big Companies With Resources

  • Clay ($1.25B val) — Can afford to experiment
  • Hebbia ($700M val) — Research-forward culture
  • Chestnut Carbon ($160M) — ML for verification is huge

🎯 Underserved Markets

  • Inspiren — Elder care + CV is niche
  • Cents — Laundry tech has zero AI competition
  • Topline Pro — Home services AI is wide open

⭐ Existing Relationship

  • Profound — Already a client, easy expansion

Outreach Template

Subject: Quick R&D idea for [Company] — [specific technique]

Hi [Name],

Congrats on [recent news/funding]. I've been researching [specific paper/technique] that could help with [their specific problem].

Quick version: [1-sentence benefit with number]

I put together a 2-page brief showing how this could work for [Company]. Want me to send it over?

Research • Positioning Analysis
01

R&D ≠ The Pain Point

The real market pain is downstream from R&D — it's about shipping AI to production.

80%

AI projects fail to reach production (RAND)

95%

GenAI pilots failing (MIT/Fortune 2025)

The gap isn't finding the right model. It's shipping AI to production.

02

The Skills Gap (Reddit Gold)

From r/MLQuestions — 688 upvotes, Nov 2025

What Candidates Know

  • Transformer architectures, attention mechanisms
  • Papers they've implemented (diffusion, GANs, LLMs)
  • Kaggle competitions, theoretical deep learning

What Companies Need

  • Deploy a model behind an API that doesn't fall over
  • Write data pipelines that process reliably
  • Debug why the model is slow/expensive in production
  • Build evals to know if the model is working

"I'll interview someone who can explain LoRA fine-tuning in detail but has never deployed anything beyond a Jupyter notebook."

— Startup co-founder hiring ML engineers

03

The Observability Gap (Your Opportunity)

From Cleanlab's survey of 95 teams with AI in production

<1/3

Teams satisfied with observability

63%

Plan to improve observability next year

70%

Rebuild AI stack every 3 months

Key Insight

Even among the 5% of companies that reach production, most remain early in maturity. They can't reliably know when their agents are right, wrong, or uncertain.

04

Reframing The Pitch

❌ OLD: "AI R&D Engineer" ✅ NEW: "Production AI Engineer"
Vibes Research, experimentation Deployment, reliability
Perception Nice-to-have Need-to-have
Target Teams with resources Teams with stuck projects
Job-to-be-done "Find the best model" "Ship to production this month"

The Positioning Gap

Aemon = the optimization engine

You = the shipping engine

05

Target Customers (Not Research Teams)

🚀 Series A-C Startups with AI Features

  • Have small ML teams, can't hire fast enough
  • ML engineers cost $200-400k and are hard to find
  • Need someone who can actually deploy, not just research

Pain: "We have 3 AI features in Jira blocked for months"

🏢 Product Companies Adding AI

  • Non-ML companies adding AI features
  • Don't have ML expertise internally

Pain: "We want AI in our product but don't know where to start"

⚙️ Enterprise AI Platform Teams

  • Drowning in stack churn (rebuilding every 3 months)
  • Coordination overhead killing velocity

Pain: "Platform team of 5 supporting 20 feature teams — we're bottlenecked"

🏛️ Regulated Industries

  • 42% plan to add oversight features (vs 16% unregulated)
  • Need governance + observability

Pain: "Can't deploy AI without compliance sign-off"

06

Better Pitch Angles

1. "Your AI Projects Are Stuck. We Ship Them."

  • Target: Companies with AI projects "in progress" for months
  • Proof: Show deployment timelines (weeks vs months)
  • Wedge: Audit → identify stuck projects → ship one fast

2. "AI Observability + Ops as a Service"

  • Target: Companies with AI in production but no visibility
  • Pain: "We don't know when our AI is wrong"
  • Proof: Catch regressions, reduce incidents

3. "The AI Platform Team You Can't Hire"

  • Target: Scaling startups without MLOps expertise
  • Pain: ML engineers cost $400k and don't want to do ops
  • Proof: Infrastructure setup in days, not months

4. "CI/CD for AI" (existing pitch)

  • Still good, but position as production not research
  • Focus on deployment gates, not model selection
  • "Every AI PR tested against your evals before merge"

Action Items

  • Rewrite pitches with "production" and "ship" language
  • Target stuck projects — companies with AI features in backlog
  • Lead with observability — 63% want better visibility
  • Offer quick wins — "Ship one AI feature in 2 weeks"
  • Avoid research teams — they don't have budget urgency