R&D Engineer • Dependabot for AI

Dependabot for AI Models

Every week a new model drops. Your team manually benchmarks it. What if that happened automatically — with PRs when something's better?

52+

New models released per month

2-4 wks

Time to benchmark each one

$0

Tools that auto-PR improvements

02

The Problem

New model every week from OpenAI, Anthropic, Google, Meta, Mistral, open-source
Teams manually benchmark against their stack — takes days per model
By the time you finish testing, three more models dropped
No one knows if they're running the best model for their use case

03

Competitive Landscape

Company	What They Do	Gap
Portkey	AI gateway, routing, 1600+ LLMs	No auto-benchmarking against YOUR stack
Unify ($8M)	Finds best LLM for the job	Router-first, not benchmark-first
Braintrust ($36M, $150M val)	Eval-driven development	Reactive, not proactive
Us	Watch → Auto-benchmark → PR when better	—

04

How It Works

1. Connect

Connect your AI stack. Define your eval suite (or we help you build one).

2. Watch

We monitor every model release across all providers. Automatically.

3. PR

When something beats your current setup, you get a PR with benchmarks.

The Dependabot Pattern

Watch → Auto-benchmark → PR when better

Nobody does this for AI models. We do.

05

ICP & Pricing

🎯 Target Customer

Any team with AI in production
3+ AI features deployed
Series B+ (or well-funded Series A)
Engineering-led decision making

💰 Pricing

Starter $2K/mo — up to 5 endpoints
Growth $8K/mo — up to 20 endpoints
Enterprise $20K+/mo — unlimited

06

Why Now?

Model release velocity is accelerating — impossible to keep up manually
LMArena proved model evaluation is a $1.7B market (raised $150M)
Braintrust proved enterprises pay for evals ($36M Series A)
Nobody has combined continuous monitoring + proactive optimization

R&D Engineer • CI/CD for AI

01

CI/CD for AI

Software engineering solved "did my change break things?" 20 years ago. AI engineering still ships blind.

🔴 AI Today

Push prompt change → Hope it works → Find out in production

🟢 With Us

Push prompt change → Eval runs → PR blocked if quality drops

02

The Insight

The gap isn't that people don't have evals — Braintrust, Humanloop, and DSPy are giving them that.

The Real Gap

Evals aren't integrated as blocking gates in deployment pipelines the way unit tests are.

03

What We Build

GitHub Action + CI Integration

Automatically runs your eval suite against every PR that touches AI code
Prompts, model configs, RAG pipelines — all covered
If eval score drops → PR is blocked
If new model improves score → PR is auto-generated

Think: Braintrust's eval engine + Dependabot's automation + GitHub Actions' CI/CD — fused into one opinionated product.

04

Aemon vs Us

Dimension	Aemon	Us
Purpose	Discover new optimal solutions	Protect existing quality + incrementally improve
Posture	Offensive R&D	Defensive Ops
Buyer	R&D Lead / ML Researcher	Engineering Manager / Platform Team
Integration	Standalone tool	Lives in your CI/CD

05

ICP & Pricing

🎯 Target Customer

3+ AI features in production
Series B+ companies
Engineering-led sale
Already using GitHub/GitLab CI

💰 Pricing

$2K – $20K/mo

Based on eval runs & endpoints

R&D Engineer • Private LMArena

01

Private LMArena

LMArena raised $150M at $1.7B valuation on public evals. Enterprises need private evals on their own data.

$1.7B

LMArena valuation (public evals)

???

Private enterprise eval market

02

The Problem with Public Benchmarks

Companies have been caught gaming LMArena scores
Public benchmarks don't reflect YOUR use cases
Generic evals ≠ production performance for YOUR data
Enterprises need proprietary intelligence

03

What We Build

Enterprise Model Intelligence Platform

Define eval suites from your production data
Continuously benchmark every new model release
Test every prompt variation, RAG config automatically
Output: Private leaderboard + recommended actions

Hugging Face's Yourbench is the open-source precursor — but it's a DIY tool requiring significant ML expertise. We productize it.

04

Aemon vs Us

Aemon	Private LMArena
Evolves novel algorithms	Evaluates existing models/configs
Research	Intelligence

05

ICP & Pricing

🎯 Target Customer

10+ AI features in production
$50K+/mo on AI infrastructure
VP of Engineering or Head of AI
Fintech, ad-tech, e-commerce, healthtech

💰 Pricing

$10K – $100K/mo

Enterprise contracts

R&D Engineer • AI Model FinOps

01

AI Model FinOps

Companies spend $85K+/mo on AI infrastructure. Nobody knows if they're overpaying for quality they don't need.

$85K

Avg monthly AI spend

36%

YoY growth

0

Visibility into cost-quality tradeoff

02

The Gap

Tool	What It Does	Missing
Portkey	Routing, fallbacks	No cost-quality optimization
Unify	Cheapest model that meets threshold	Not continuous, not production data
Us	Continuously optimize cost-quality frontier across entire AI stack

03

What We Build

FinOps + Quality Optimization Layer

An agent that sits on top of your AI gateway:

Continuously profiles every AI call (model, cost, latency, quality)
Uses your production data as the eval
Generates actionable recommendations:

          "Switch endpoint X from GPT-4o to Claude 3.5 Sonnet — saves $8K/mo, quality improves 2%"

          "Your RAG pipeline on endpoint Y is underperforming — here's an optimized config"

04

ICP & Pricing

🎯 Target Customer

$20K+/mo on LLM APIs
CFO / VP Eng sale
Any industry with AI in production

💰 Pricing

$2K – $15K/mo

Pays for itself from savings

⚡ Easiest ROI story of all these ideas

R&D Engineer • Eval-as-a-Service

01

Eval-as-a-Service

Building good evals is harder than building the AI features themselves. We build the oracle.

02

The Insight

The Bottleneck Isn't Optimization

Braintrust's thesis: "If your eval is right, every decision becomes simple."

DSPy's framework depends on having good metrics to optimize against.

The bottleneck in the entire AI development loop is knowing what "good" looks like.

03

What We Build

Eval Generation Agent

Takes your production AI traces
Analyzes failure modes
Interviews domain experts (async, Slack-based)
Generates calibrated eval suites:

✓ Datasets

✓ Scoring rubrics

✓ Automated judges

Output plugs into Braintrust, DSPy, or your own CI/CD.

04

Aemon vs Us

Aemon	Eval-as-a-Service
Assumes you have a good eval function	Creates the eval function
Optimizer	Oracle
Depends on eval quality	Is the prerequisite to everything else

If you own the eval layer, you become the foundation every optimization tool depends on.

05

ICP & Pricing

🎯 Target Customer

Same as Braintrust's customers
AI product teams at Series B+
Earlier in journey — before they've figured out evals

💰 Pricing

$5K – $30K/mo

Per eval suite built + maintenance

01

LEAD PITCH: Fat Startup

⚡ OpenHolly

AI-Powered Outcomes.
Not Tools. Not Reports.

We operate fleets of AI agents that deliver results. Customers get outcomes. We get playbooks. Playbooks become platform.

Fat Startup $4K MRR 5 Customers Playbooks Compounding

A fat startup ships outcomes, not features. It bundles software, data, and human ops into one integrated product that actually gets the job done.

— Andrew Lee, a16z Speedrun Partner

02

The Shift

Spinning up AI agents is now trivial. Managing them is the new bottleneck.

What's Easy Now

🤖

One-click agent deployment

OpenClaw, dockerized instances, cloud GPUs

🔀

Capable models

GPT-5, Claude 4, open-source alternatives

💰

Economics work

$0.01-0.10 per task, not $50/hr

What's Still Hard

⚠️

People become pseudo-IT

Babysitting agents instead of running business

⚠️

Debugging eats time

Every hour on agent issues ≠ hour on actual work

⚠️

No one wants to manage agents

They want outcomes, not infrastructure

The Insight

Founders are too busy to become AI ops engineers. We absorb that complexity so they can focus on their actual business.

03

How We Got Here

We started in sales. Then customers kept asking for more.

📧

Started: Sales

SDR automation

→

🎬

Then: Video

ML training data gen

→

🔬

Then: Research

University lab assets

→

💡
Pattern
We manage, they get results

The Variety We've Delivered

🏗️

SDR for construction companies

Lead gen + qualification

🎬

Video generation for ML training

Synthetic data pipelines

🔬

Research assets for universities

Literature review + synthesis

🚀

BDR for startups

Outbound + meeting booking

The Common Thread

Every customer had the same problem:

"I tried spinning up agents myself. Then I spent all my time debugging them instead of running my business."

— Pattern across customers

They didn't want to manage AI. They wanted outcomes.

04

The Market Reality

95%

AI projects fail before production

MIT Project NANDA

70%

AI SDR users churn in 3 months

Industry data

$47K

Lost from one agent runaway

TechStartups

171%

ROI when deployment succeeds

MIT NANDA

Why Tools Aren't Enough

Companies don't want to become AI operations experts. They want someone to absorb the complexity and just deliver results.

05

The Model: Managed AI Operations

We operate agent fleets. Customers get outcomes. We encode playbooks.

🎯

Customer Goal

"50 qualified meetings/month"

→

⚡
Our Engineers
Configure agent fleet

→

🤖

Agent Fleet

Research, outreach, qualify

→

✅

Outcome

Meetings on calendar

DIY / SaaS Tools

🛠️

You manage the agents

Become pseudo-IT for AI

🐢

Weeks to figure out

Setup, config, debugging

❓

Hope it works

No guarantee of outcomes

OpenHolly (Us)

✅

We manage the agents

You focus on your business

⚡

Results in days

We've done this before (playbooks)

🎯

Outcomes guaranteed

Pay for results, not effort

06

Current Focus: GTM/Sales

Starting with sales because the outcome is measurable: meetings booked.

Why Sales First

📊

Clear success metric

Meetings booked = revenue

💔

Broken market

70% AI SDR churn = customers looking for alternatives

💰

High willingness to pay

$5-10K/month for what works

✅

We have traction

50% of our revenue is SDR/BDR

What We Deliver

🔍

Research Agent

Deep prospect intelligence

✍️

Outreach Agent

Personalized messaging

📋

Qualification Agent

Score and prioritize leads

📅

Scheduling Agent

Book the meeting

Expansion Path

Sales → Research/Intel → Operations → Content. Each vertical = new playbook, same infrastructure.

07

The Unlock: Playbooks Compound

Every engagement encodes a playbook. Playbooks make the next engagement faster. This is how we build the moat.

🛠️
Year 1: Agency
Do the work, learn playbooks

→

📚

Year 2: Productize

Playbooks become templates

→

🏗️

Year 3: Platform

Others build on our templates

What's In A Playbook

Every engagement becomes encoded knowledge:

📝

Workflow sequences

What steps work for each use case

🎯

Prompt templates

Messaging that actually converts

⚙️

Agent configurations

Which models, tools, and sequences

🚫

Failure patterns

What breaks and how to prevent it

The Compounding Effect

1️⃣

Customer 1: 2 weeks

Figure everything out from scratch

5️⃣

Customer 5: 3 days

Apply existing playbook + customize

🔟

Customer 10: Hours

Playbook is battle-tested

🏗️

Eventually: Self-serve

Playbooks become product

The Fat Startup Advantage

We're getting paid to build our moat. Every dollar of revenue = more encoded knowledge. Competitors starting later start from zero.

08

Technical Insight

We're productizing the research consensus on what actually works.

The Research Convergence

📄

Workflow-First Architecture

Declarative orchestration beats autonomous agents (Microsoft, 2024-25 surveys)

👤

HITL as Training Signal

Human edits train intervention policies (ReHAC, EMNLP 2024)

🎯

Playbooks as Optimization Surface

Prompts + tool-use are parameters to optimize (AVATAR, NeurIPS 2024)

🛡️

Guardrails are Required

Transparency + oversight for multi-agent systems (Nature, 2026)

Our Implementation

✓

Declarative playbooks

Versioned configs, not imperative code

✓

Logged human checkpoints

Every edit = structured training signal

✓

Continuous optimization

Prompts, branching, model routing improve over time

✓

Action-layer guardrails

Can't be prompt-injected, auditable

We log trajectories, human edits, and outcomes, then update prompts, branching logic, and model routing so the same business objective is achieved more reliably over time. The playbook is the learned policy space.

— Our technical thesis

09

The Compound Library

The internal system that makes agent workflows repeatable and efficient.

🔧

Verified Tools

Tested integrations

+

💬

Working Prompts

By use case + vertical

+

🧠

Model Routing

Which model where

+

🚫

Failure Patterns

What breaks + fixes

↓

📦
New Client Workflow
Compose from proven components

Without This System

🔄

Reinvent every time

Which tools? Which prompts? Which models?

🐢

Slow iteration

Learn the same lessons repeatedly

📉

Linear scaling

More clients = more eng hours

With The Compound Library

⚡

Compose from proven

Verified, tested, reusable primitives

📈

Each engagement adds

Learnings feed back into system

🚀

Sublinear scaling

More clients = richer library = faster

The Compounding Effect

Workflow #1 takes a week. Workflow #10 takes a day. Workflow #100 takes hours. The library IS the moat.

10

Why Us

Team

Keith Schacht

Co-Founder + Advisor

$140M exit (Mystery) · a16z funded · Built consumer products used by millions

[CTO]

Co-Founder & CTO

CTO of Because · $3M Seed · Deep agent infrastructure experience

Yasir

Co-Founder

yapthis.com · Agentic architecture · Shipped production agent systems

Unfair Advantages

🐕

We're running on OpenClaw

Dog-fooding our own infrastructure daily

📊

We've built observability

ClawView for agent monitoring

🛡️

We've built guardrails

Agent Seatbelt for safety

💵

Revenue already

$4K MRR, +$2K this week

11

Traction

$4K

MRR

5

Customers

50%

SDR/BDR

+$2K

Added This Week

What This Proves

Companies will pay for AI-powered outcomes when someone else manages the complexity. The demand is real. The model works.

12

The Ask

What We Need

💰

$[X] Pre-Seed

Scale agent fleet + engineering team

🎯

12-month goal: $1M ARR

Prove the playbooks at scale

📚

Then: Productize

Turn proven playbooks into self-serve templates

Why Now

🚀

OpenClaw + GPT-5 + Claude 4

Agents just became capable enough

💔

AI SDR market burned

70% churn = customers looking for what works

⏰

First-mover on playbooks

Every month we operate = more encoded knowledge

OpenHolly: AI-Powered Outcomes

Customers get results. We get playbooks. Playbooks become platform.

01

V1: Personal AI OS

🦢 OpenHolly

Your Personal AI OS

An AI that knows your context, anticipates your needs, and takes action on your behalf—not a chatbot you have to prompt.

Pre-Seed $4K MRR 5 Customers Always-On AI

The Vision

Imagine an AI that actually knows you—your work, your preferences, your patterns. It doesn't wait for commands. It proactively handles tasks, flags important things, and learns from every interaction.

02

The $56B Opportunity

Personal AI assistants are about to explode.

$16B

AI Assistant Market 2024

Grand View Research

$56B

Projected by 2034

Market.us (38% CAGR)

75%

Households with AI by 2025

Gartner Forecast

72%

US Teens Use AI Companions

2025 Study

AI personal agents will arrive soon. What we do now with apps—manually, and in piecemeal fashion—will be done automatically. If a flight is cancelled, an AI agent will rebook the flight, reschedule meetings, and order food.

— Goldman Sachs, "What to Expect from AI in 2026"

03

Why Current Assistants Fail

Siri, Alexa, and Google Assistant lost the AI race. Here's why.

❌ The Problem

🧠

No Persistent Memory

Context resets after 2-3 turns. They forget everything.

⏸️

Reactive, Not Proactive

Wait for commands. Never anticipate needs.

🔒

Siloed Knowledge

Can't connect your email, calendar, work, and life.

🤖

Limited Actions

"I can't do that" is their signature phrase.

✓ Personal AI OS

🧠

128K+ Token Context

Remembers weeks of interactions. Learns your patterns.

⚡

Proactive Intelligence

Anticipates what you need before you ask.

🔗

Connected Context

Sees your whole digital life—with your permission.

🛠️

Real Actions

Browser, shell, files, messages—actual work gets done.

Microsoft's CEO called AI assistants "dumb as a rock." The truth is, they've stagnated while chatbots evolved.

— Industry Analysis, 2023-2024

04

The Hardware Graveyard

Why dedicated AI devices keep failing—and what we learned.

$699

Humane AI Pin

Flopped 2024 — WIRED "Biggest Flop"

$199

Rabbit R1

"Underwhelming, underpowered" — The Verge

$350M

Rewind/Limitless

Acquired by Meta Dec 2025

$2.7B

Character.AI

Google licensing deal 2024

The Lesson

Hardware failed because it created friction instead of removing it. The winning approach: software that works with your existing devices—phone, laptop, wearables—not another gadget to carry.

Both Rabbit R1 and Humane AI Pin missed a crucial opportunity: integrating with existing user bases. Why create a separate device when you could leverage smartphones and their vast ecosystem?

— Medium Analysis, July 2024

05

Proactive vs. Reactive

The fundamental shift in how AI should work for you.

⏸️

Reactive AI

You ask → It responds

→

⚡
Proactive AI
It notices → It acts

→

🧠

Anticipatory AI

It predicts → You approve

Reactive (Siri/ChatGPT)

"Hey Siri, add milk to my shopping list"

"ChatGPT, summarize this document"

You initiate every interaction. You remember to ask.

Proactive (Personal AI OS)

"You're almost out of milk. Added to cart—confirm?"

"Your flight changed. I rebooked + rescheduled 2 meetings."

AI monitors context. Surfaces what matters. Acts with permission.

Gartner predicts 40% of enterprise apps will embed task-specific AI agents by 2026, evolving assistants into proactive workflow partners.

— Forbes, "Agentic AI Takes Over," Dec 2025

06

Why Now?

Four converging forces make this the moment.

Technology Ready

🧠

GPT-5 / Claude 4

Models finally capable of real reasoning

📝

128K+ Context Windows

Memory across weeks of interaction

🔧

MCP + Tool Use

Agents can control apps natively

💰

Economics Work

$0.01-0.10 per task, not $50/hr

Market Ready

📈

96% Enterprise Expansion

Plan to increase agentic AI budgets

PwC May 2025 Survey

🎯

25% → 50% Adoption

Enterprise GenAI agents 2025 → 2027

Deloitte Forecast

😤

Siri Fatigue

95% frustrated with current assistants

The Manifest Survey

🔐

Privacy Tailwinds

Apple Intelligence proves local AI demand

07

What Users Actually Want

From surveys, Reddit, and academic research.

Desires

🧠

Memory That Persists

"Remember what I told you last week"

⚡

Proactive Help

"Remind me before I forget"

🎯

Deep Personalization

"Know my preferences without asking"

🔐

Privacy Control

"My data stays mine"

Evidence

93% of respondents predict agentic AI will enable more personalized, proactive, and predictive services.

— Cisco 2025 AI Study

An assistant that knows you. The future of personal assistants is when the helper learns from your data, documents, and writing style.

— AI Industry Forecast 2026

08

How It Works

Always-on AI that learns, anticipates, and acts.

👁️

Observes Context

Email, calendar, browsing, work

→

🧠

Learns Patterns

Preferences, routines, priorities

→

💡
Surfaces Insights
Proactive suggestions

→

✅

Takes Action

With human approval

Current Focus: SDR/BDR

🔍

Research Agent

Deep prospect intelligence

✍️

Outreach Agent

Personalized messaging

📅

Scheduling Agent

Meeting coordination

Platform Vision

📧

Email Intelligence

Triage, draft, follow-up

📊

Research & Analysis

Deep work on autopilot

🔧

Ops & Admin

The tasks you hate, automated

09

The Unique Wedge

What makes this different from Siri/Alexa/Google Assistant?

Big Tech Assistants

🏢

Built for mass market

Generic. Lowest common denominator.

📊

Data goes to them

Your context trains their models.

🔒

Walled garden

Only works in their ecosystem.

⏸️

Stagnant development

Lost the AI race years ago.

Personal AI OS

🎯

Built for power users

Deep personalization for serious work.

🔐

Your data stays yours

Local-first. You control what's shared.

🔓

Cross-platform

Works with your existing tools.

🚀

Cutting-edge models

GPT-5, Claude 4, always the best.

The Positioning

We're not competing with Siri for "set a timer." We're building the second brain for knowledge workers—people who will pay for AI that actually makes them more effective.

10

Traction & Team

$4K

MRR

5

Customers

50%

SDR/BDR

+$2K

This Week

Keith Schacht

Co-Founder + Advisor

$140M exit (Mystery) · a16z funded · Built consumer products used by millions

[CTO]

Co-Founder & CTO

CTO of Because · $3M Seed · Deep agent infrastructure

Yasir

Co-Founder

yapthis.com · Agentic architecture · Shipped production agent systems

11

The Ask

What We Need

💰

$[X] Pre-Seed

Scale agent infrastructure + team

🎯

12-month goal: $1M ARR

Prove the Personal AI OS at scale

📚

Then: Consumer launch

Personal AI for everyone

Why This Team

🐕

We use it daily

Dogfooding OpenClaw constantly

📊

Built observability

ClawView for agent monitoring

🛡️

Built safety

Agent Seatbelt for guardrails

💵

Already have revenue

Proving demand before pitching

OpenHolly: Your Personal AI OS

An AI that knows you, anticipates your needs, and takes action—not just another chatbot waiting for prompts.

The Thesis in One Line

The shift from reactive AI to proactive AI is a $56B market. We're building the operating system for it.

01

V2: Outcome-Based Pricing

💰 OpenHolly

Pay Per Meeting,
Not Per Seat

The SaaS pricing model is breaking. AI does the work now—so why pay for human logins? We deliver outcomes and charge when they happen.

$4K MRR 0% Churn Outcome-Aligned 5 Customers

AI is driving a shift toward outcome-based pricing. Per-seat is no longer the atomic unit of software. If AI can handle a sizable proportion of customer support, companies will need far fewer human agents, and therefore fewer software seats.

— a16z Enterprise Newsletter, December 2024

02

The Pricing Revolution

SaaS pricing is undergoing its biggest shift since the cloud. AI is killing the per-seat model.

61%

SaaS using usage-based pricing (2022)

OpenView

30%

Enterprise SaaS with outcome-based by 2025

Gartner

43%

Enterprise buyers prefer outcome/risk-share pricing

Industry Data

2-3x

Higher traction for outcome-priced AI products

BetterCloud 2025

Seat-based pricing may not fit when AI is doing the work. If an agent replaces a human task, customers will expect to pay based on outcomes, not log-ons.

— Bain Technology Report 2025

03

Why Seats Are Dying

The logic of per-seat pricing breaks when AI replaces the humans who need seats.

The Broken Math

📉

AI replaces 10 analysts with 1 agent

Per-seat pricing undervalues the automation

💸

$5-10K/month regardless of results

70% churn when outcomes don't follow

❓

Soft ROI = death at renewal

2025 pilots hitting 2026 renewals—"are we really getting value?"

The New Model

🎯

Pay for work completed

Not for access to tools

📊

ROI in their sleep

Customers calculate value instantly: $X per meeting = clear math

🤝

Aligned incentives

We only win when you win

The Bessemer Thesis

AI-native companies are abandoning seat-based SaaS pricing in favor of usage-, output-, and outcome-based models that directly align revenue with measurable results.

— Bessemer Venture Partners, "The AI Pricing and Monetization Playbook" (Feb 2026)

04

Who's Already Winning

The market leaders are proving outcome-based AI pricing works at scale.

Intercom Fin

Customer Support AI

$0.99 per resolution

65% resolution rate. Aligns every team around one outcome: resolved tickets. Now deployed at 99% of conversations.

Zendesk AI Agents

Customer Support AI

Outcome-based pricing

"First in CX industry to offer outcome-based pricing for AI agents" — August 2024 announcement.

EvenUp

Legal AI

Per demand package

AI + legal experts generate personal injury demand letters. Per output pricing, not hourly.

Decagon

Enterprise AI Support

Per-conversation + per-resolution

Hybrid model. Usage (conversations) + outcome (resolutions). Featured in a16z podcast.

Leena AI

Employee Support AI

ROI-based (tickets closed)

Shifted from consumption → outcomes. Customers gained clearer ROI, business accelerated.

Scale AI

Data Labeling → Platform

$13.8B valuation

Started as labeling services. Became infrastructure. Services → outcomes → platform.

The Pattern

Every major AI-native company is moving toward outcome-based pricing. This isn't experimentation—it's convergence.

05

Why Enterprises Love It

43% of enterprise buyers consider outcome-based pricing a significant factor in purchase decisions.

Buyer Psychology

🧮

Instant ROI Calculation

"$X per meeting booked" = CFO-ready math. No spreadsheet gymnastics.

🛡️

Zero Implementation Risk

If it doesn't work, you don't pay. Risk transferred to vendor.

📈

Scales With Value

More meetings = more spend = more value captured. Natural expansion.

🔄

No Renewal Anxiety

You're paying for results. Why churn from something that works?

What Buyers Say

"Why should we pay $X per user if we could pay $Y per outcome? Aligning price with realized value improves the ROI calculus."

— Enterprise buyer sentiment (Industry research)

"The fundamental shift is to stop charging for access and start charging for work done."

— Bain Technology Report 2025

Deloitte 2026 Prediction

"Outcome- or value-based pricing is based on the real business results that SaaS applications with AI agents produce. There will be a gradual move toward a future powered by integrated, autonomous multi-agent systems."

06

Our Model: Pay Per Meeting

We operate AI agent fleets that book qualified sales meetings. You pay only when meetings happen.

🎯

Define Outcome

"50 qualified meetings/month"

→

🤖
Agent Fleet Works
Research, outreach, qualify, book

→

📅

Meeting Booked

Verified on calendar

→

💰

You Pay

Only for outcomes

❌ Traditional AI SDR

$5-10K

/month regardless of results

70%

Churn in 3 months

???

ROI unclear, hard to justify

✓ OpenHolly Outcome Model

$250-500

Per qualified meeting booked

0%

Risk if agents don't perform

∞

ROI: only pay when it works

07

Unit Economics That Work

Outcome-based pricing isn't charity—it's better economics for everyone.

Our Economics

💵

$250-500 per meeting

Customer pays on outcome

🤖

$30-80 cost to deliver

AI compute + tooling + human oversight

📈

3-7x margin

Healthy unit economics, scales with volume

🔄

Playbooks compound

Each meeting → better templates → lower cost

Customer Economics

✓

Meeting = $5K-50K deal potential

$250-500 per meeting is a no-brainer

✓

Zero upfront commitment

Start small, scale with proof

✓

Budget predictability

Cost tracks linearly with value

✓

Easy internal approval

CFO loves outcome-based spend

The Intercom Lesson

"Intercom's $0.99 per resolution aligns every team around one outcome: resolved tickets. If Fin resolves a ticket in three messages or thirty, the customer pays the same. The risk is real—but the reward is equally real: customers know exactly what they're getting, and they can calculate ROI in their sleep."

— Bessemer, Feb 2026

08

Managing the Risks

Outcome-based pricing has real risks. Here's how we mitigate them.

The Risks

⚠️

Cost variability

Some meetings cost more than others

⚠️

Revenue unpredictability

Customer usage varies month to month

⚠️

Attribution disputes

"Did your AI really book this?"

⚠️

Abuse potential

Customers gaming the system

Our Mitigations

✓

Minimum commitments

Base retainer + outcome fees = floor

✓

Playbook compounding

Cost per outcome drops with scale

✓

Clear outcome definitions

Contractually defined: what counts

✓

Full audit trail

Every action logged, no disputes

Industry Standard Emerging

"Agreements around basic definitions for things like 'an agent,' 'a task,' 'a process,' 'an interaction,' and 'an outcome' should be clearly defined, communicated, and agreed upon contractually." — Deloitte TMT Predictions 2026

09

Traction

$4K

MRR

5

Customers

0%

Churn

+$2K

Added This Week

Why Zero Churn

🎯

Aligned incentives

They pay for results → they get results → no reason to leave

📈

Clear value

Every invoice shows exactly what they got

🔄

Natural expansion

"It's working—give me more"

Customer Mix

🏗️

50% SDR/BDR

Our wedge: sales meetings

🎬

30% Video/ML

Synthetic data pipelines

🔬

20% Research

University lab assets

When you only pay for results, there's no reason to churn. Aligned incentives = sticky customers. This is why Intercom's outcome-based Fin has 99% deployment.

10

Team

Keith Schacht

Co-Founder + Advisor

$140M exit (Mystery) · a16z funded · Built consumer products used by millions

[CTO]

Co-Founder & CTO

CTO of Because · $3M Seed · Deep agent infrastructure experience

Yasir

Co-Founder

yapthis.com · Agentic architecture · Shipped production agent systems

Unfair Advantages

🐕

Dog-fooding daily

Running on OpenClaw infrastructure

📚

Playbooks compounding

Every engagement → better templates

Why Outcome-Based Wins

💰

We absorb the risk

Customers love it → lower CAC, zero churn

🎯

We're incentivized to deliver

Better AI = more margin for us

11

The Thesis

You post a bounty: "$500 per meeting booked." AI agents compete. Whoever performs best gets paid. We already do this with bug bounties, Kaggle, hackathons. Why not for AI agents?

— Macy Mills, a16z Speedrun Partner

Why Now

📈

Market timing

61% → 30%+ outcome-based adoption wave

💔

AI SDR burnout

70% churn = customers looking for what works

🏢

Enterprise demand

43% prefer outcome-based pricing

Comparable Outcomes

🚀

Scale AI: $13.8B

Services → outcomes → platform

📊

Pilot: $1.2B

Bookkeeping outcomes, not seats

💬

Intercom Fin

$0.99/resolution, 99% deployment

OpenHolly: Pay Per Outcome

AI agents that deliver results. You only pay when they do. The future of how work gets priced.

📚 Sources

a16z Enterprise Newsletter (Dec 2024) • Bessemer "AI Pricing Playbook" (Feb 2026) • Bain Technology Report 2025 • Deloitte TMT Predictions 2026 • OpenView SaaS Benchmarks • Gartner • EY "SaaS Transformation with GenAI" (Nov 2025) • BetterCloud "AI and SaaS Industry 2026" • Intercom Fin pricing page • Zendesk AI Agents announcement (Aug 2024)

01

V3: Anti-AI-SDR

🔥 OpenHolly

The $500M AI SDR Market
Is Imploding. We're the Fix.

50-70% churn rates. LinkedIn bans. Domain blacklists. The "autonomous AI SDR" thesis failed. Human-in-the-loop is winning.

50-70%

AI SDR Churn Rate

Common Room, Feb 2025

$7.5K

Spent for 1 Demo

Reddit r/SaaS, Dec 2025

0

Sales from AI SDR Leads

Theory Ventures CRO

80%+

Human-in-Loop Success

MarketBetter G2: 4.97/5

02

The AI SDR Disaster: Real Data

"AI SDRs don't work—biggest bubble in tech." — LinkedIn comment with 400+ likes

💀 What's Actually Happening

"Their AI continuously hallucinated, getting things wrong about what my company does, the industry we are in, what products we sell. 1 positive reply, 1 demo, thousands of prospects touched, $7.5K down the drain."

— r/SaaS, Dec 2025

"A CRO from a publicly traded company disclosed that while an AI SDR helped generate a substantial volume of leads over a nine-month period, it did not lead to actual sales."

— Tomasz Tunguz, Theory Ventures

"Reports emerged of Artisan accounts, including those of team members and founders, facing restrictions or bans for suspected spam and automation violations."

— Quasa.io, Jan 2026

📊 The Numbers Don't Lie

📉

50-70% Annual Churn

2x the churn of human SDRs (a role notorious for turnover) — Common Room

🚫

LinkedIn Bans Spreading

Platform ramped up AI detection, restricting automation-heavy accounts

📧

Domain Blacklisting

Gmail filtering harshened. Sender reputations destroyed in weeks.

⚖️

Legal Exposure

GDPR fines up to 4% revenue. TCPA: $500-1,500 per message.

💔

Brand Damage

"Permanent brand damage from being publicly associated with spam" — NUACOM

03

Even VCs Are Calling It

TechCrunch: "AI sales rep startups are booming. So why are VCs wary?"

"When one studies any of these startups individually, it's like 'wow, that's stunning product market fit.' When all 10 of them have stunning product market fit, it's hard to answer 'How is that going to play out?'"

— Shardul Shah, Partner, Index Ventures (hasn't invested)

"Without access to differentiated data, AI SDR startups risk being overtaken by incumbents like Salesforce, HubSpot, and ZoomInfo."

— Chris Farmer, CEO, SignalFire

"Investors are not surprised by the rapid adoption of AI SDRs; they are just doubting that adoption is sticky."

— TechCrunch, Dec 2024

The Jasper Cautionary Tale

$1.5B → 30% Layoffs

Jasper, the AI copywriting unicorn, ran into speed bumps and had to lay off 30% of staff after ChatGPT launched. AI SDRs face the same commoditization risk.

Why Adoption Isn't Sticky

1

Garbage In, Garbage Out

Built on commoditized LinkedIn data = undifferentiated output

2

Ops is Afterthought

Black boxes that create more work, not less

3

Feature, Not Product

Incumbents (Salesforce, HubSpot) can bundle this free

04

The Fundamental Flaw: Autonomous ≠ Better

"The AI SDR is dead, long live the AI SDR: How the future is Human-in-the-Loop"

❌ Why Autonomous Fails

🤖

No Emotional Intelligence

Can't read tone, context, or cultural nuance essential in enterprise sales

🎯

No Real Consent

Scraped data without consent → GDPR/CCPA violations

⚖️

No Accountability

When AI misleads, your company bears the liability

🔄

Volume Over Value

"More volume on a bad message is not a strategy. It is self-sabotage."

👻

Fake Personalization

"Commenting on someone's hoodie feels forced because it's a hollow observation"

✓ What Actually Works

"Teams that use AI to support human insight consistently outperform teams trying to replace humans entirely. It's not even close."

— Matthew Metros, The AI SDR is Dead

🔍

AI Does Research (90%)

Data mining, signal detection, prospect prioritization

👤

Humans Do Relationships (10%)

Judgment, trust, closing

✅

Human-in-Loop = Higher Ratings

MarketBetter (human oversight): 4.97/5 G2 rating

📈

Better Outcomes

"Human-in-the-loop platforms consistently outperform fully autonomous ones"

05

OpenHolly: The Anti-AI-SDR

We're not building another AI SDR. We're building what should have been built from the start.

❌ 11x / Artisan / AiSDR

🤖

Replace human judgment

"Autonomous AI employee"

📧

Optimize for volume

"6,000 contacts/month"

💰

Per-seat pricing

$5-10K/mo regardless of results

📦

You manage the tool

Become pseudo-IT for AI

🎰

Hope it works

No outcome guarantees

✓ OpenHolly

👤

Augment human judgment

AI research + human checkpoints

🎯

Optimize for quality

Right message, right person, right time

💵

Outcome-aligned pricing

Pay for meetings, not seats

🛠️

We manage the agents

You focus on your business

✅

Results guaranteed

Outcomes or you don't pay

06

How OpenHolly Works

AI handles the research. Humans make the decisions. You get meetings.

🎯

Your Goal

"50 qualified meetings/mo"

→

🔍

AI Research

Signals, intent, fit scoring

→

👤
Human Checkpoint
Review & approve outreach

→

✍️

AI Execution

Send, follow-up, schedule

→

📅

Meeting Booked

Qualified, on calendar

What AI Handles (90%)

🔍

Deep Prospect Research

Intent signals, company news, technographics, pain points

📊

Lead Scoring & Prioritization

Who to contact and why, right now

✍️

Draft Generation

Personalized outreach based on real signals

📧

Multi-channel Execution

Email, LinkedIn (safely), follow-ups

What Humans Handle (10%)

✅

Approval Gates

Review before sending to high-value prospects

💬

Live Conversations

When a prospect engages, humans take over

🎯

Strategy & ICP

Define who you want to reach and why

🧠

Judgment Calls

Edge cases, sensitive prospects, brand protection

07

The Market Opportunity: Fix AI SDR

Their 50-70% churn is our customer acquisition channel.

$500M+

Raised by AI SDR startups

11x, Artisan, AiSDR, etc.

50-70%

Will churn this year

Common Room data

$250M+

Churned customers/year

Market opportunity

Human-in-Loop

What they'll switch to

The thesis

The Churned Customer Profile

💔

Burned by AI SDR tools

Spent $5-10K/mo, got spam complaints

📧

Domain reputation damaged

Need to rebuild sender trust

😤

Still need meetings

The problem didn't go away

🎯

Now understand quality > volume

Educated by failure

Why They'll Choose Us

✅

Outcome-based pricing

Only pay for meetings that happen

🛡️

Brand protection

Human oversight prevents embarrassments

📊

Proven playbooks

We've learned what works across verticals

🤝

We absorb the complexity

They don't manage agents, they get results

08

Traction: The Thesis Is Working

$4K

MRR

5

Customers

0%

Churn

+$2K

Added This Week

Why Zero Churn

Aligned Incentives

When customers only pay for results, there's no reason to churn. If we don't deliver meetings, they don't pay. Simple.

vs. AI SDR Churn

AI SDRs charge $5-10K/mo whether or not they work. When they don't deliver, customers leave. Misaligned incentives = 50-70% churn.

Team

Keith Schacht

Co-Founder + Advisor

$140M exit (Mystery) · a16z funded · Built consumer products used by millions

[CTO]

Co-Founder & CTO

CTO of Because · $3M Seed · Deep agent infrastructure

Yasir

Co-Founder

yapthis.com · Agentic architecture · Production agent systems

09

The Ask

What We Need

💰

$[X] Pre-Seed

Scale human oversight operations + agent infrastructure

🎯

12-month goal: $1M ARR

Prove the anti-AI-SDR thesis at scale

📚

Then: Productize

Turn proven playbooks into self-serve platform

Why Now

💔

AI SDR market imploding

50-70% churn = massive displaced customer base

📈

Human-in-loop proven

Highest G2 ratings go to human-oversight tools

⏰

First-mover on "fix"

Position as the safe alternative before market consolidates

OpenHolly: The Anti-AI-SDR

AI SDRs promised automation. They delivered spam, bans, and brand damage. We deliver meetings — with human judgment where it matters. Their 50-70% churn is our customer acquisition channel.

📚 Sources

Common Room "The AI SDR is dead" (Feb 2025) · TechCrunch "AI sales rep startups are booming. So why are VCs wary?" (Dec 2024) · Reddit r/SaaS AI SDR complaints · Quasa.io Artisan LinkedIn bans (Jan 2026) · Pipeline Group "Hidden Dangers of AI SDRs" · Theory Ventures SaaStr Talk · MarketBetter G2 Reviews

01

V5: Agent Seatbelt

🛡️ Agent Seatbelt

The Safety Layer
Before AI Gets the Keys

Browser-layer guardrails that block irreversible AI actions before they happen.

$47K

Lost in one AI runaway

84%

Have zero safety boundaries

3am

When agents go rogue

100%

Preventable with guardrails

02

The "$39K Gone in a Blink" Problem

AI agents fail not from bad models, but from bad guardrails. 84% of companies deploying agents have zero safety boundaries defined.

— GenDigital Agent Trust Hub Research, 2026

What Goes Wrong

💸

Runaway API costs

$47K overnight cloud bills

📧

Wrong recipients

AI SDR emails competitors

🗑️

Irreversible actions

Deleted production data

🔓

Credential leaks

Pricing sent to wrong channel

What We Block

✓

Site-specific rules

Block LinkedIn "Follow" for AI SDRs

✓

Action classification

Read vs. Write vs. Irreversible

✓

Human approval gates

Require confirmation for risky ops

✓

Rate limiting

Prevent runaway loops

03

How It Works

Chrome extension that intercepts agent browser actions

🤖

Agent Action

→

🛡️
Seatbelt Intercept

→

⚖️

Risk Classification

→

✅

Allow / Block / Human

Why Browser Layer

Framework-agnostic. Works with any AI agent (OpenClaw, LangChain, AutoGen, custom). Install once, protect everything.

04

Market & Competitive Position

Why Now

📈

OpenClaw: 9K → 60K stars

Autonomous agents exploding

⚠️

CyberArk security concerns

Enterprise worried about agent security

📜

EU AI Act

Regulatory tailwinds for safety

Competition

🟡

GenDigital Agent Trust Hub

Just launched - validates market

🟢

Our Angle

Browser-layer = framework-agnostic

🟢

MVP Achievable

Chrome extension ships fast

Agent Seatbelt

The seatbelt you install before giving AI the keys.

🔗 Supports These Pitches

Fat Startup • AWS of AI Work • Control Plane

Part of the human oversight layer that makes agent work reliable.

01

V6: ClawView

📊 ClawView

Datadog for
Autonomous Agents

When your AI employee sends the wrong email at 3am, you'll know exactly why.

The Problem

Companies are deploying autonomous AI agents that run 24/7. When something goes wrong—and it will—they have no idea why. Current tools are built for request-response, not proactive agents.

02

Current Tools Miss Autonomous Agents

LangSmith / Langfuse / Arize

❌

Request-response patterns

User sends message, LLM responds

❌

Chain tracing

LangChain-specific, not agent-native

❌

No proactive agent support

Built for chatbots, not employees

ClawView

✓

Autonomous operation

24/7 agents taking proactive actions

✓

Decision tracing

Why did it make that choice?

✓

Multi-channel + tools

Shell, browser, files, messages

03

The "Oh Shit" Demo

🤖

Agent receives task

→

🧠

Makes decisions

→

💥

Something goes wrong

→

🔍
ClawView shows why

Without ClawView

"The agent sent the wrong email. Logs show it ran. No idea why."

With ClawView

"Step 3: Agent assumed X because of context Y. Here's how to prevent this class of error."

ClawView: See What Your Agents Actually Do

Every decision. Every action. Every assumption. Full causal tracing.

🔗 Supports These Pitches

Fat Startup • AWS of AI Work • Control Plane

Observability layer — see what agents are doing before they go wrong.

⚠️ Why This is a Feature, Not a Company

Langfuse, LangSmith, Arize are well-funded. But none are built for autonomous agents. ClawView is our internal observability layer, not a separate product pitch.

01

V7: AgentGov

🏛️ AgentGov

Governance for
AI Employees

Audit trails. Approval workflows. Compliance automation. The control layer enterprises need.

84%

No safety boundaries

0

Audit trails today

EU AI Act

Compliance required

2026

Enforcement begins

02

The Governance Gap

AI agents fail not from bad models, but from bad guardrails. The unlock isn't better agents—it's better safety rails.

— Industry consensus, 2026

What's Missing

❌

No audit trails

What did the agent do at 3am?

❌

No approval workflows

High-stakes actions go unsupervised

❌

No compliance framework

EU AI Act enforcement coming

❌

No agent-on-agent supervision

Humans can't supervise at machine speed

AgentGov Provides

✓

Immutable audit trails

Every action, every decision, timestamped

✓

Approval workflows

Human gates for high-stakes actions

✓

Compliance automation

EU AI Act ready, audit reports generated

✓

AI supervision layer

Validator agents checking worker agents

03

From "Human in Loop" to "Human on Loop"

👤

Human IN Loop

Approve every action

→

👁️
Human ON Loop
Exception handling

→

🏛️

Human ABOVE Loop

Strategic oversight

McKinsey Insight

"Organizations are moving from human in the loop to human on the loop—above the loop for strategic oversight." AgentGov enables this transition safely.

AgentGov: Govern AI at Scale

Audit trails. Approval workflows. Compliance automation. Trust at machine speed.

🔗 Supports These Pitches

Fat Startup • AWS of AI Work • Control Plane

Governance + compliance layer — enables enterprise trust.

🔬 Key Research

Gravitee 2026: Only 14.4% have full security approval for agents. 88% reported incidents.
EU AI Act: Enforcement begins 2026, mandates audit trails.
Zenity: $38M Series B validates market (but they're low-code focused, not agent-native).

01

V8: AI Employee OS

🤖 AI Employee OS

The Full Stack for
AI Employees

10 layers an AI employee needs to fulfill an entire job description. We're building the unified platform.

The Thesis

An AI employee's value lies in performing EVERYTHING in a job description—not just one workflow. This requires a complete infrastructure stack.

02

The 10-Layer Stack

1Memory & Personality

2Skills & Capabilities

3Tools & Integrations

4Identity & Access

5Objectives & Goals

6Task Management

7Work Artifacts & KB

8Supervision & Oversight ⭐

9Communication (A2A) ⭐

10QA & Compliance ⭐

What's Missing (⭐)

Layers 8-10 are the critical gaps. Everyone's building capabilities. Nobody's building supervision, agent-to-agent communication, and compliance.

03

The Integration Problem

Current landscape is fragmented

Today: Point Solutions

📦

Memory: Mem0, Zep, LangMem

📦

Tools: MCP servers

📦

Identity: Okta, 1Password

📦

Tasks: LangGraph, CrewAI

📦

Compliance: Guardrails AI, Trail

Tomorrow: AI Employee OS

A unified platform that manages the full AI employee lifecycle.

✓

Integrated stack

All 10 layers, one platform

✓

Turnkey deployment

Job description → Working AI employee

✓

Enterprise governance

Built-in compliance, audit, oversight

AI Employee OS

The unified platform for deploying, managing, and governing AI employees.

🔗 Framework For These Pitches

Fat Startup • AWS of AI Work • Control Plane

The 10-layer framework is how we think about what AI employees need.

⚠️ Why This is a Framework, Not a Pitch

Building all 10 layers is massive. We focus on Layers 8-10 (supervision, communication, compliance) because that's the critical gap. The framework informs strategy, not the pitch itself.

01

V9: AgentDocs

📚 AgentDocs

Stack Overflow
for AI Agents

Verified working code. Real benchmarks. Pay-per-snippet micropayments. Documentation that actually works.

200x

Slower (Whisper vs Groq)

Garry Tan, YC Feb 2026

∞

Hallucinated APIs

$2.2M

30-day x402 volume

x402scan.com

0

Verified snippet services

Claude Code chose Whisper V1 — near-deprecated — over Groq (200x faster, 10x cheaper) because OpenAI's docs are cleaner. Agents pick tools by doc quality, not performance.

— Garry Tan, YC Partner, Feb 2026

Even the best dev tools don't let you sign up via API. This is a big miss in the claude code age — claude can't sign up on its own.

— Jared Friedman, YC Partner, Feb 2026

02

The Hallucination Tax

Despite our best efforts, they will always hallucinate. That will never go away.

— Amr Awadallah, Vectara CEO, 2026

❌ The Problem

❌

Best-documented ≠ Best solution

Agents pick whatever has most examples

❌

Documentation gets stale

APIs change, snippets break

❌

No verification

Agent can't know if code actually runs

❌

No benchmarks

No cost/perf data to guide decisions

✓ AgentDocs

✓

Agent-swarm verified

Code tested continuously, timestamped

✓

Use-case organized

"Transcribe video" → 10 services compared

✓

Real benchmarks

Cost, latency, quality scores

✓

x402 micropayments

$0.05 per verified snippet

03

How It Works

🤖

Agent needs code

"Send email via API"

→

🔍

Query AgentDocs

Structured API

→

💳
HTTP 402
Pay $0.05 via x402

→

✅

Verified snippet

Tested 2 hours ago

Kill the API Key

No signup. No rate limits. No accounts. Agent pays per-request, gets verified code. Native to how agents want to consume services.

Launch Order (by x402 + Pain Score)

1

🎙️ Transcription — NOW

Groq, Deepgram, Whisper. Zero x402 servers. Garry Tan moment.

2

🎬 Video Gen — Dogfood

Kling, Runway, Wan. Parameter chaos unsolved.

3

🧠 LLM Routing

Model selection based on task + budget

4

📧 Agent Identity

Email + phone + wallet in one API

What Agents Get

✓

Working code snippet

Verified against real APIs

✓

Normalized output schema

Same format across providers

✓

Cost + latency benchmarks

Real numbers, updated hourly

✓

Routing recommendation

"For fast+cheap → use Groq"

04

Market & Competition

Closest Competitor: Context7

🟡

Up-to-date docs

✓ They have this

🟡

Version-specific

✓ They have this

❌

Verified working

No continuous testing

❌

Benchmarks

No cost/perf data

❌

Micropayments

Free only, no agent-native billing

Why Now

💰

x402 is production-ready

$43M+ processed, 35M+ txns

🤖

Agent adoption exploding

OpenClaw: 9K→60K stars

📈

$50B market by 2030

AI agent infrastructure

🎯

Clear wedge

Verification is table stakes soon

The x402 Thesis

25,000+ developers building on x402. Google, Cloudflare, Stripe adopting. Machine-to-machine payments are the rails for agent economy.

05

x402 Market Opportunity

Real-time data from x402scan.com shows a booming agent economy — with a clear gap for developer tooling.

$2.21M

30-Day x402 Volume

x402scan.com, Feb 2026

4.2M

Transactions (30 days)

~140K/day average

8,559

Active Buyer Agents

Coinbase facilitator alone

0

Verified Snippet Services

Gap in the market

All 14 Facilitators

Facilitator	30d Txns	30d Vol	What They Do
Dexter	1.65M	$79.5K	Agent economy platform
Coinbase	722K	$288.5K	Official CDP facilitator
Virtuals Protocol	412K	$1.34M	AI agent tokenization
PayAI	1.31M	$43.3K	Micropayments
RelAI	66K	$84K	Agent payments (Solana)
Meridian	19K	$315K	High-value transactions
Thirdweb	~10K	~$2K	Web3 dev platform
OpenX402	6.6K	$38.6K	Open-source facilitator
Polymer	6.4K	$770	Proof generation
AnySpend	~3K	~$5K	Multi-asset spending

+ Corbits, OpenFacilitator, CustomPay, AgentPay (emerging)

Source: x402scan.com, Feb 27 2026

Market Gap Analysis

🔍

What Exists

Data APIs, AI services, crypto tools, social data

❌

What's Missing

Verified code snippets, curated docs, developer knowledge

💡

AgentDocs Opportunity

Be the Stack Overflow layer on x402 rails

Why We Can Win

Top services (StableEnrich, LowPaymentFee) aggregate APIs — they don't verify code quality.
AgentDocs: Premium pricing ($0.05-0.10) justified by verification + benchmarks.
Target: 1,000+ requests/day = $2,100+/month revenue from agent micropayments alone.

06

Revenue Model

AgentDocs: Documentation That Works

Verified snippets. Real benchmarks. Agent-native payments. Stack Overflow, but for machines.

🔗 Supports These Pitches

Fat Startup • AWS of AI Work

Better documentation → better agent outputs → more reliable outcomes.

📍 Current Progress

Live: agentdocs-api.holly-3f6.workers.dev
Snippets: 15 use cases, 21 verified snippets
Status: Dogfooding internally, expanding library

01

PORTAL

🚪 Portal

Autonomous Service
Signup for Agents

AI agents can write code, deploy apps, and manage infrastructure. But they can't sign up for a Stripe account. We fix that.

Even the best developer tools mostly still don't let you sign up for an account via API. This is a big miss in the claude code age because it means that claude can't sign up on its own. Putting all your account management functions in your API should be table stakes now.

— Jared Friedman, YC Partner, Feb 27 2026

181

Replies to Jared's tweet

1,336

Likes in 12 hours

0

Solutions today

$0.50-2

Per signup (x402)

02

The Problem: Last Mile of Agent Autonomy

✓ What Agents CAN Do

✓

Write entire codebases

✓

Deploy to staging

✓

Run tests, fix bugs

✓

Manage infrastructure

✗ What Agents CAN'T Do

✗

Sign up for Stripe

✗

Create a Vercel account

✗

Get an API key from Resend

✗

Click "Verify Email"

Hit this exact wall last week. Claude Code can scaffold an entire project, write tests, deploy to staging, but needs me to manually sign up for a third party service and paste in an API key. The last mile of developer tooling is still stuck in 2019.

— @advikjain_, replying to Jared

03

Community Validation

What developers said in response to Jared's tweet

"This is a real friction point for agentic workflows. The auth layer is always manual. Companies that figure out API-first account provisioning will eat the ones stuck in dashboard-only onboarding."

— @thebasedcapital

"I've watched AI tools fail at basic integration tasks because they hit the 'create account manually' wall. We're debating whether Claude can replace junior devs but it can't even sign up for Stripe."

— @OneManSaas

"Signup is just the tip. Billing, permissions, onboarding — everything assumes a human in a UI. Devtools that go full API-first for the entire lifecycle get a massive edge when agents pick their own stack."

— @wildpinesai (tagging @paulg)

"Bigger issue than just signup. Most SaaS still treats APIs as a feature for power users, not the primary interface. When your biggest customer is an agent, the whole product surface needs to be API-first."

— @twitter user

The Skeptics (and why they're wrong)

"Won't this enable bot spam?" — Valid concern, but x402 payments solve this. Agents pay real money per signup. Spam bots won't pay $1 per account.
"Companies don't want bot signups" — They want PAYING customers. Agent-initiated signups that convert to revenue are valuable.

04

How Portal Works

🤖

Agent Request

"I need Vercel access"

→

💳

x402 Payment

$1.00 USDC

→

🚪

Portal Queue

Job ID + poll URL

→

🖥️

Worker Fleet

Browser automation

→

🔑

Credentials

API key + password

API Flow

        POST /signup

        { "service": "vercel" }

        → 201 Created

        {

          "job_id": "portal_abc123",

          "poll_url": "https://...",

          "estimated_seconds": 30

        }

What Agent Receives

        GET /credentials/portal_abc123

        {

          "api_key": "vercel_xxx",

          "email": "agent-abc@portal...",

          "password": "encrypted...",

          "account_url": "https://..."

        }

05

Email Modes

🏠 Portal-Managed

We provision agent-{id}@portal.viewholly.com

We handle email verification automatically
No email infrastructure needed from agent
Simplest path — just call the API

        { "email_mode": "portal_managed" }
      

📧 Agent-Provided

Agent brings their own email (AgentMail, etc.)

Agent controls the identity
Integrates with existing email service
Agent must forward verification emails

        { "email_mode": "agent_provided",

          "agent_email": "bot@agentmail.com" }

06

x402 Market Opportunity

Agent payments are live. Portal fits perfectly.

$2.2M

30-day x402 volume

x402scan.com

4.2M

Transactions (30 days)

513

Active merchants

0

Signup services

What Exists on x402

✓

Data APIs

StableEnrich, httpay

✓

AI Services

Virtuals ACP ($163K/day)

✓

Social Data

StableSocial, TweetX402

✓

Email for Agents

StableEmail (314 txns)

What's Missing

0

Account Signup Services

Nobody solving this

0

API Key Provisioning

Wide open

0

Identity + Onboarding

Jared's exact point

07

Services & Pricing

Service	Complexity	Price	Est. Time	Status
Resend	Simple	$0.50	20s	MVP
Railway	Simple	$0.50	25s	MVP
Vercel	Email verify	$1.00	30s	Week 2
Supabase	Email verify	$1.00	35s	Week 2
Cloudflare	Email verify	$1.00	30s	Week 2
Stripe	2FA / Complex	$2.00	60s	Phase 2

Revenue Model

1,000 signups/day × $1 avg = $30K/month
Infrastructure cost: ~$500/month (workers + CF)
Gross margin: 98%

08

Architecture

┌─────────────────────────────────────────────────────────────┐
│                         AGENT                                │
│              (Claude Code, OpenClaw, any AI)                │
└─────────────────────────────────────────────────────────────┘
                            │
                            │ POST /signup (x402 $1)
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                  PORTAL API (CF Workers)                    │
│          Hono + @x402/hono + D1 job queue                   │
│              Returns job_id in <100ms                        │
└─────────────────────────────────────────────────────────────┘
                            │
                            │ Workers poll
                            ▼
┌─────────────────────────────────────────────────────────────┐
│               WORKER FLEET (OpenClaw Instances)             │
│    ┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐         │
│    │Worker 1│  │Worker 2│  │Worker 3│  │Worker 4│  (4+)   │
│    │Browser │  │Browser │  │Browser │  │Browser │         │
│    └────────┘  └────────┘  └────────┘  └────────┘         │
└─────────────────────────────────────────────────────────────┘
                            │
                            │ Encrypted credentials
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                CREDENTIAL VAULT (KV)                        │
│         One-time retrieval • 5-min TTL • Encrypted          │
└─────────────────────────────────────────────────────────────┘

09

Security Model

🔐 Credential Handling

Passwords encrypted at rest
One-time retrieval (deleted after GET)
5-minute TTL auto-delete
Full audit logging

🛡️ Why x402 Prevents Abuse

$0.50-2 per signup = spam is expensive
Wallet-based identity for accountability
Rate limiting per wallet
Abuse = burned wallet reputation

"I can only imagine allowing full automation when there's a direct path to monetisation. Maybe when we have a more reliable API for charging agents for specific actions automatically."

— @Everlier, replying to Jared

x402 IS that reliable API. Portal is the first service to use it for signup.

10

Progress & Roadmap

✅ Done (Today)

Architecture design
x402 API server (Hono)
D1 job queue schema
5 service playbooks
Credential vault design
GitHub repo ready

🔧 Week 1

Deploy to CF Workers
First worker (local OpenClaw)
Resend + Railway working
Email domain setup
One-time credential retrieval

🚀 Week 2-3

Worker fleet (4+ instances)
Vercel, Supabase, Cloudflare
WebSocket subscriptions
Webhook callbacks
x402 payment integration

🚪 Portal: The Missing Link

Agents can do everything except onboard to services. Portal fixes the last mile of agent autonomy.

Repo: github.com/moltyfromclaw/portal

01 / 12

AWS OF AI WORK

☁️ OpenHolly

The Infrastructure Layer
for AI Agent Work

$30-40B poured into AI agents. 95% fail to deliver. We're building the missing infrastructure that makes them actually work.

$50B+

AI Agent Market by 2030

MarketsandMarkets, Grand View Research

95%

Enterprise AI Pilots Fail

MIT NANDA Study, 2025

$4K

MRR (Live)

171%

ROI When It Works

MIT NANDA

02 / 12

The $30B Problem

Companies are pouring billions into AI agents. Almost none deliver measurable returns.

95%

AI pilots deliver zero measurable return

MIT NANDA Study

80%

AI projects fail (2x normal IT)

RAND Corporation

46%

PoCs scrapped before production

WorkOS Research

70-80%

AI SDR churn within months

11x, Artisan data

Companies are pouring $30–40 billion into generative AI, yet an MIT study finds that 95% of enterprise pilots deliver zero measurable return.

— MIT NANDA: The GenAI Divide, 2025

03 / 12

Why AI Agents Fail

The pattern is consistent. It's not the models—it's the infrastructure.

❌ What Breaks

1

No workflow templates

Teams reinvent every agent from scratch. Same failures, different companies.

2

No human oversight

Agents run unsupervised. High-stakes errors go uncaught. Trust collapses.

3

No failure patterns

Each company learns the same lessons. No accumulated knowledge.

4

No orchestration

Multi-agent systems collapse. Stanford CooperBench: 25% success rate.

✓ What's Missing: Infrastructure

✓

Battle-tested workflow templates

Proven prompts, integrations, and sequences. Encoded from real deployments.

✓

Human-in-the-loop routing

Smart escalation. Approval queues. Humans handle edge cases.

✓

Failure pattern library

What breaks and how to prevent it. Compound learning across clients.

✓

Agent orchestration layer

Coordinate multi-agent work. Handle failures gracefully.

The Unlock

The 5% that succeed have infrastructure. Templates. Oversight. Failure patterns. We're building that infrastructure as a service.

04 / 12

The Playbook: Services → Platform

The most valuable infrastructure companies started by doing the work themselves.

Scale AI

Data Labeling → AI Infrastructure

Started labeling images for self-driving cars (2016). Now the "Data Foundry" powering OpenAI, Meta, Google. 50% gross margins from tech-enabled services.

$29B

Valuation (Meta investment, 2025)

Sacra, TechCrunch

Pilot

Bookkeeping Services → Financial Infra

"AWS for SMB accounting." Started doing bookkeeping. Now processes $3B+ in transactions. Jeff Bezos led funding.

$1.2B

Valuation (2021)

CNBC, TechCrunch

Stripe

Payments API → Financial Infrastructure

Started with simple payment processing (2010). Expanded to Connect, Radar, Atlas. Infrastructure that grows as customers grow.

$107B

Valuation (2024)

Wikipedia, Sacra

The Pattern

Do the work → Encode the patterns → Become the platform. Services fund the R&D. Each engagement builds the moat. Competitors starting later start from zero.

05 / 12

Scale AI: The Detailed Parallel

Their journey is our playbook. Same model, different layer.

Scale AI's Model

1

Services Entry

Started labeling images for AV companies. Revenue from day one.

2

Tech Layer

Built pre-labeling ML that made each human 10x more efficient.

3

Data Flywheel

Each correction improved their models. More data = better automation.

4

Platform Expansion

Nucleus, Validate, Launch—from labeling to full ML lifecycle.

Our Model

1

Services Entry

Operating AI agent workflows for clients. Revenue from day one.

2

Tech Layer

Workflow templates + orchestration that make agents reliable.

3

Playbook Flywheel

Each engagement encodes learnings. More workflows = better templates.

4

Platform Expansion

Guardrails, Observability, Governance—full agent lifecycle.

Scale AI is not a traditional BPO company. It is a Data Foundry. Their technology layer is their moat—human workforce augmented by proprietary software that compounds in value.

— Takafumi Endo, "Scale AI: Deconstructing the Foundry"

06 / 12

The Workflow Template Moat

Each engagement encodes a playbook. Playbooks become the platform.

🔧

Verified Prompts

By use case + vertical

+

🔗

Integration Patterns

What connects to what

+

🚫

Failure Patterns

What breaks + fixes

+

👤

Human Routing

When to escalate

↓

📦
Workflow Template Library
Deploy new client in hours, not weeks

Compounding Effect

1️⃣

Customer 1: 2 weeks

Figure everything out from scratch

5️⃣

Customer 5: 3 days

Apply existing playbook + customize

🔟

Customer 10: Hours

Playbook is battle-tested

📦

Customer 50+: Self-serve

Playbooks become product

What's In A Template

📝

Prompt sequences

What actually works for each use case

⚙️

Model routing

Which models for which tasks (cost/quality)

🔗

Tool configurations

Integrations, APIs, credentials patterns

🛡️

Guardrail rules

What to block, what to escalate

07 / 12

Why Infrastructure Wins

Application companies fight for customers. Infrastructure companies power the ecosystem.

❌ Application Layer

📊

Compete on features

Race to the bottom. Easy to copy.

🔄

Linear growth

Each customer = new acquisition cost

💰

2-5x revenue multiples

Commodity software pricing

🏃

Low switching costs

Customers can leave anytime

✓ Infrastructure Layer

🏗️

Compete on reliability

Mission-critical. Hard to replicate.

📈

Compound growth

Templates improve → more value → more customers

💎

10-25x revenue multiples

Scale AI: 18x. Stripe: higher.

🔒

High switching costs

Workflows built on your templates

Network effects are the underlying principle behind the success of companies like AWS, Stripe, and Salesforce. Higher network density means the product value increases.

— NFX: The Network Effects Manual

08 / 12

Market Size: $50-70B by 2030

AI agents are the fastest-growing category in enterprise software. We're building the infrastructure layer.

$7.8B

AI Agents Market (2025)

MarketsandMarkets

$52.6B

AI Agents Market (2030)

MarketsandMarkets

46.3%

CAGR Growth Rate

2025-2030 forecast

$183B

Bullish Forecast (2033)

Grand View Research

Our TAM Slice: Infrastructure

If AI Agents are $50B, infrastructure is 20-30% of stack value:

$10-15B

Agent Infrastructure TAM by 2030

Why We Win This Slice

🎯

First-mover on playbooks

Every month = more encoded knowledge

💰

Revenue while building

Services fund the platform

🧠

Real deployment data

Failure patterns competitors don't have

09 / 12

The Infrastructure Stack

Four layers that make AI agents reliable. We're building all four.

1

Workflow Templates
Verified prompts, sequences, integrations

2

Agent Orchestration
Multi-agent coordination, task routing

3

Human Oversight
Approval queues, escalation, feedback loops

4

Guardrails + Observability
Safety rails, monitoring, audit trails

Current Products

🛡️

Agent Seatbelt

Browser-layer guardrails that block irreversible actions

📊

ClawView

Observability for autonomous agents. See what they do.

🏛️

AgentGov

Governance, compliance, audit trails

📚

AgentDocs

Verified code snippets for agent tool use

10 / 12

Current Traction

$4K

MRR

5

Paying Clients

3

Workflow Types

SDR, Video Gen, Research

+$2K

Added This Week

What We've Delivered

🏗️

SDR for construction companies

Lead gen + qualification workflows

🎬

Video generation for ML training

Synthetic data pipeline workflows

🔬

Research for universities

Literature review + synthesis workflows

🚀

BDR for startups

Outbound + meeting booking workflows

What This Proves

Fat Startup Thesis

We're getting paid to build our moat. Every dollar of revenue = more encoded knowledge. Competitors starting later start from zero.

"A fat startup ships outcomes, not features. It bundles software, data, and human ops into one integrated product that actually gets the job done."

— Andrew Lee, a16z Speedrun

11 / 12

The Path Forward

🛠️
Year 1: Services
$1M ARR · 50+ playbooks

→

📚

Year 2: Productize

Self-serve templates

→

🏗️

Year 3: Platform

Others build on us

12-Month Milestones

💰

$1M ARR

Prove unit economics at scale

📚

50+ Workflow Templates

Across 5+ verticals

🔧

Infrastructure Products Live

Guardrails, Observability, Governance

📦

First Self-Serve Templates

Deploy without our team

Why Now

🚀

Models just got capable enough

GPT-5, Claude 4—agents can work

💔

AI SDR market burned

70-80% churn = customers seeking alternatives

⏰

Infrastructure window open

No dominant player yet. First-mover wins.

📜

Regulatory tailwinds

EU AI Act mandates oversight, audit trails

12 / 12

The Ask

The AWS of AI Work

Infrastructure that makes AI agents reliable. Workflow templates. Orchestration. Human oversight.

Every company deploying agents will need this. We're building it.

Team

Keith Schacht

Co-Founder + Advisor

$140M exit (Mystery) · a16z funded

[CTO]

Co-Founder & CTO

CTO of Because · $3M Seed

Yasir

Co-Founder

yapthis.com · Shipped production agents

Key Sources

MIT NANDA Study: 95% AI failure rate, 171% ROI when successful

MarketsandMarkets: $7.8B → $52.6B AI agents market (2025-2030)

Scale AI (Sacra): $1.5B ARR, $29B valuation, 50% gross margins

Pilot (CNBC/TechCrunch): $1.2B valuation, Bezos-backed

11x/Artisan: 70-80% churn within months (Broadn research)

RAND Corporation: 80% AI project failure rate

01 / 12

MARKETPLACE THESIS

🚗 OpenHolly

The Uber for AI Work

Post an outcome. AI agents compete. Pay only for results. We're building the outcome marketplace for the AI economy.

$4K

MRR Today

70%

Network effects create tech value

NFX Research

$13.8B

Scale AI valuation

Services → Platform

$60M+

GitCoin distributed

Bounty model works

02 / 12

The a16z Speedrun Thesis

This is the exact model a16z partners are calling for in 2026.

Say you need 50 qualified sales meetings. Instead of buying another AI tool, you post a bounty: "$500 per meeting booked." AI agents compete. Whoever performs best gets paid. We already do this with bug bounties, Kaggle, hackathons. Why not for AI agents going after real business outcomes?

— Macy Mills, a16z Speedrun, "14 Big Ideas for 2026"

I'm especially excited about products that use AI to make previously expensive services cheaper and more accessible, sometimes using human-in-the-loop to start.

— Kenan Saleh, a16z Speedrun, "14 Big Ideas for 2026"

A fat startup ships outcomes, not features. It bundles software, data, and human ops into one integrated product that actually gets the job done.

— Andrew Lee, a16z Speedrun Partner

03 / 12

The Market Shift: Tools → Outcomes

The freelance marketplace is $1.5T. It's about to be disrupted by AI agents.

❌ Legacy Marketplaces

📝

Upwork: $1.67B market cap

Pay humans by the hour. Hope they deliver.

📝

Fiverr: ~$1B market cap

Fixed-price gigs. Still human-dependent.

🐢

Slow, expensive, variable

Wait days. Pay premium. Quality varies.

✓ AI Agent Marketplace (Us)

🎯

Pay per outcome, not effort

$X per meeting, $Y per video, $Z per lead.

⚡

Hours, not days

AI agents work 24/7. Instant scale.

📈

Network effects compound

More agents = better matching = better outcomes.

The Paradigm Shift

As we move to a future based on outcome-based pricing that perfectly aligns incentives between vendors and users, we'll first move away from time-based billing. — a16z Big Ideas 2026

04 / 12

How It Works

Bounties + Escrow + AI Agents = Outcome Marketplace

🎯

Post Bounty

"50 meetings @ $500 each"

→

💰

Escrow Funds

Payment locked

→

🤖
Agents Compete
Best performers win

→

✅

Verify & Release

QA passes → pay out

For Buyers

📝

Define the outcome

"Book qualified meeting" or "Generate product video"

💵

Set your price

Pay what the outcome is worth to you

🔒

Zero risk

Funds held in escrow. Pay only on delivery.

For Agents (Supply Side)

🎰

Pick bounties that fit

Match capabilities to opportunities

📊

Build reputation

Success rate → more bounties → more revenue

💰

Get paid instantly

Verified outcome → automatic payout

05 / 12

The Bounty Model Works

Proven in bug bounties, open source, and ML competitions. Now it's time for AI work.

$60M+

GitCoin distributed

Open source bounties

$100M+

Bug bounties/year

HackerOne + Bugcrowd

$1B+

Kaggle prize pool

ML competitions

10M+

Replit users

Bounties marketplace

Precedent: Replit Bounties

Imagine a tool where you describe your problem and get a solution built for you. Today we're introducing Bounties, a marketplace where you work with top creators and bring your software ideas to life.

— Replit, on launching Bounties

Replit proved bounties work for code. We're proving it works for any AI-deliverable outcome.

Precedent: GitCoin

Over the past 5 years we've supported the funding of public goods. Started with bounties for open source, evolved to quadratic funding.

— GitCoin: $60M+ distributed

GitCoin proved bounties + crypto payments = massive coordination. We're applying this to AI agent work.

06 / 12

Network Effects: The Moat

70% of tech value comes from network effects. Here's how we build them.

Network effects have been responsible for 70% of all the value created in technology since 1994. Founders who deeply understand how they work will be better positioned to build category-defining companies.

— NFX, "The Network Effects Bible"

Two-Sided Marketplace NFX

👤

More buyers → More bounties

Attracts more agents to the platform

🤖

More agents → Better matching

Faster delivery, higher quality outcomes

📈

Better outcomes → More buyers

Word of mouth, lower prices, faster delivery

Data Network Effects

📊

Every bounty = training data

What works, what fails, edge cases

🧠

Smarter matching over time

Route bounties to best-fit agents

🔒

Proprietary playbook library

Compound knowledge competitors can't replicate

Metcalfe's Law

Value of a network grows proportional to N² (nodes squared). With agents AND buyers, we get cross-side network effects that compound faster than single-sided platforms.

07 / 12

Trust Layer: How Agents Build Reputation

The missing infrastructure for AI agent marketplaces.

Agent Identity & Track Record

🆔

Verifiable agent identity

Who built it, what it can do, audit trail

📈

Per-function reputation

Track record based on actual outcomes, not reviews

🏆

Specialization scores

"This agent is 94% on sales meetings, 78% on video"

Trust Mechanics

🔒

Escrow with time-locks

Funds released only on verified delivery

⚖️

Dispute resolution

Human or AI arbitration for edge cases

📉

Sliding refund scale

Partial credit for partial delivery

🆕

New Agent

Low trust, small bounties

→

📊

Track Record

Outcomes verified

→

⭐
Trusted Agent
High-value bounties

→

🏅

Elite Status

Premium rates, priority

08 / 12

Path: Managed → Open Marketplace

Like Uber: start premium, then open the platform.

🛠️
Phase 1: Now
We run the agents

→

🤝

Phase 2: Partners

Vetted agent builders

→

🌐

Phase 3: Open

Any agent can join

Phase 1: Managed (Now)

✅

We operate all agents

Quality control, learn playbooks

✅

$4K MRR validates demand

Customers paying for outcomes

✅

Build trust infrastructure

Escrow, verification, reputation

Phase 2-3: Marketplace

🔜

Invite partner agents

Vetted builders, revenue share

🔜

Open to all agents

Anyone can compete for bounties

🔜

Platform take rate: 15-20%

Like Uber, Airbnb, marketplace standard

The Uber Playbook

Uber started with black cars (premium, managed) before opening to UberX (open marketplace). We start with our agents, prove economics, then open to all. Services fund the platform build.

09 / 12

Comparable Companies & Valuations

Services → Platform is a proven path to massive outcomes.

$13.8B

Scale AI

Data labeling services → platform

$1.67B

Upwork

Freelance marketplace (ripe for disruption)

$1.2B

Pilot

Bookkeeping: humans + AI

$50B+

Palantir

Services → Platform → Public

Scale AI: Our North Star

1️⃣

Started as services

Data labeling for ML companies

2️⃣

Built the platform

Tools, workflows, quality systems

3️⃣

$2B+ revenue (2025)

Services funded the infrastructure

4️⃣

$13.8B valuation

Platform economics, not services multiples

Why We're Bigger

📊

Scale AI: One vertical

Data labeling for ML

🌐

Us: All AI-deliverable work

Sales, content, research, ops...

📈

TAM: $1.5T+ services market

Every white-collar task that can be AI'd

10 / 12

Why Now: The Perfect Storm

GPT-5

Agents now capable

x402

Machine payments ready

a16z Big Ideas 2026

70%

AI SDR churn

Tools failing, outcomes wanted

$1B+

AI coding revenue (2025)

a16z: Agent apps thriving

Technology Inflection

🧠

Models capable enough

GPT-5, Claude 4 can do real work

💳

x402 machine payments

Agents can transact autonomously

🔧

Infrastructure exists

OpenClaw, MCP, agent frameworks

Market Readiness

💔

AI tools disappointing

70% churn = buyers want outcomes

💰

Budget exists

Companies spending on AI, getting nothing

🏃

First mover advantage

No AI-native outcome marketplace yet

Emerging primitives like x402 make payment settlement programmable and reactive. Smart contracts can settle a dollar payment globally in seconds. In 2026, this becomes the rails for agent commerce.

— a16z Big Ideas 2026, Part 3

11 / 12

Team & Traction

$4K

MRR

5

Customers

3-7x

Margin Multiple

0%

Churn (outcome-aligned)

Keith Schacht

Co-Founder + Advisor

$140M exit (Mystery) · a16z funded · Built consumer products used by millions

[CTO]

Co-Founder & CTO

CTO of Because · $3M Seed · Deep agent infrastructure experience

Yasir

Co-Founder

yapthis.com · Agentic architecture · Shipped production agent systems

What Traction Proves

Companies pay for outcomes. 0% churn because incentives align. This is the business model for AI work.

12 / 12

The Ask

What We Need

💰

$[X] Pre-Seed

Scale agent capacity, build marketplace infra

🎯

12-month goal: $1M ARR

Prove economics before opening marketplace

🌐

24-month: Open marketplace

Partner agents, then fully open

Why Us

🐕

Dog-fooding OpenClaw

We run agents daily, know what breaks

📊

Built the infrastructure

ClawView, guardrails, workflows

💵

Revenue already

$4K MRR proves the model

OpenHolly: The Uber for AI Work

Post an outcome. AI agents compete. Pay for results. The marketplace that makes AI actually deliver.

🔧 Infrastructure We're Building

🛡️ Guardrails • 📊 ClawView • 🏛️ AgentGov

Trust layer that makes marketplace outcomes reliable.

📚 Sources

a16z: "14 Big Ideas for 2026" (Macy Mills, Andrew Lee, Kenan Saleh) • "Big Ideas 2026 Part 1-3" • NFX: "The Network Effects Bible" (70% of tech value) • Market Data: Scale AI ($13.8B), Upwork ($1.67B), GitCoin ($60M+ distributed) • Replit: Bounties marketplace launch

1 / 12

CONTROL PLANE THESIS

🎛️ OpenHolly

The Control Plane for
AI Agents

Everyone's building autonomous agents. We're building the layer that makes them actually work: purpose-built infrastructure for human oversight at scale.

95%

AI pilots fail to deliver ROI

MIT Research, 2025

17x

Error amplification in "bag of agents"

DeepMind, Dec 2025

71%

Accuracy improvement with HITL

Microsoft Magentic-UI

$4.5K

MRR proving the thesis

OpenHolly, Feb 2026

2 / 12

The Inconvenient Truth: Autonomy Fails

The research is clear—and the industry is learning the hard way.

Multi-Agent Systems Break Down

"Multi-agent architectures, despite their promise, can fall short on efficiency, reliability, and even accuracy... performance often degrades as coordination complexity increases."

— Berkeley/DeepMind "Why Multi-Agent LLM Systems Fail", 2025

📊

75% failure rate

ChatDev on ProgramDev benchmark

📊

~50% average task completion

Across autonomous agent frameworks

📊

17x error amplification

In uncoordinated "bag of agents"

Enterprise AI Projects Crater

"42% of companies abandoned most of their AI initiatives in 2024, up from 17% the previous year. The average organization scrapped 46% of AI proof-of-concepts."

— S&P Global Research, 2024

📊

95% of AI pilots fail

MIT Research on enterprise deployments

📊

80%+ never reach production

RAND Corporation AI project study

📊

2x failure rate vs traditional IT

AI projects vs standard software

Why This Matters

The industry is betting billions on fully autonomous agents. The research says they don't work. Someone needs to build the layer that makes them work.

3 / 12

Microsoft's Answer: Human-in-the-Loop

The largest AI research org in the world just validated our thesis.

"We argue that human-in-the-loop agentic systems offer a promising path forward, combining human oversight and control with AI efficiency to unlock productivity from imperfect systems."

— Microsoft Research, Magentic-UI (July 2025)

Magentic-UI Results

71%

Accuracy improvement with human-in-loop

30.3% → 51.9% on GAIA benchmark

📊

Only 10% of tasks needed human help

Lightweight intervention, massive improvement

📊

1.1 avg clarifications per help request

Minimal interaction overhead

Key Interaction Mechanisms

🤝

Co-planning

Human + agent collaborate on plan before execution

🔄

Co-tasking

Seamless handoff between human and agent control

🛡️

Action guards

Human approval for high-stakes actions

🧠

Memory

Learn from past interactions to improve

Microsoft's Conclusion

"Even as tomorrow's agents become more capable and reliable, we believe that human involvement will remain essential for preserving human agency, resolving unforeseen ambiguities, and guiding agents in adapting to an ever-changing world."

4 / 12

Anthropic's Findings: The Oversight Paradox

Real-world data from millions of Claude Code sessions reveals how humans actually oversee agents.

As Users Gain Experience...

📈

Auto-approve increases: 20% → 40%+

Experienced users let Claude run autonomously

📈

BUT interrupt rate ALSO increases: 5% → 9%

They intervene more often, not less

💡

The shift: Step-by-step → Exception-based

From approving everything to watching for problems

Agent-Initiated Stops Matter

🤖

Claude asks for clarification 2x more

On complex tasks vs simple ones

🤖

More often than humans interrupt

On the most difficult tasks

💡

Models know when they're uncertain

They can (and should) ask for help

"Effective oversight doesn't require approving every action but being in a position to intervene when it matters... our central conclusion is that effective oversight of agents will require new forms of post-deployment monitoring infrastructure and new human-AI interaction paradigms."

— Anthropic Research, "Measuring AI Agent Autonomy in Practice" (Feb 2026)

The Deployment Overhang

Anthropic found that "the autonomy models are capable of handling exceeds what they exercise in practice." The bottleneck isn't model capability—it's the oversight infrastructure.

5 / 12

Air Traffic Control for AI Agents

The analogy everyone is converging on—and what it means for product design.

"Think of agents within your multi-agent system as the airplanes. The agents have their own autonomy to act. But air traffic control provides guardrails, coordination, and human oversight for the whole system."

— Jason Bryant, AI in Pharma (Jan 2026)

Why Air Traffic Control Works

✈️

Planes are autonomous

Pilots make real-time decisions

🗼

Controllers handle coordination

Routing, conflicts, emergencies

👤

Humans handle edge cases

Technology can't modify standard procedures

🔄

System improves over time

Incidents become new procedures

Why This Analogy Matters

📊

Scaling ratio: 1 controller : many planes

Not 1:1 human-to-agent

🛡️

Controllers can't replace pilots

Nor vice versa—complementary roles

⚠️

No full automation possible

Edge cases require human judgment

💰

Multi-billion dollar industry

ATC isn't going away

The Thesis

As AI agents proliferate, every company will need an "air traffic control" system for their agent fleet. That's the control plane we're building.

6 / 12

Why Current Interfaces Fail

Existing tools weren't designed for the human-agent oversight problem.

❌ Chat Interfaces

Conversational, not workflow-oriented. Can't manage 100 agents. No approval queues. No batch operations. You'd need a chat window per agent.

❌ Code/GitHub

Great for developers. Useless for ops teams. Can't approve actions in real-time. No visual understanding of agent state or intent.

❌ Slack/Email Alerts

Ad hoc approvals. No context. Alert fatigue. Doesn't learn from decisions. Can't see what agent plans to do next.

❌ Observability Dashboards

Read-only visibility. No intervention capability. See problems after they happen. Can't modify agent plans mid-execution.

"Only 14.4% of enterprises have full security approval for AI agents. 88% reported agent-related incidents. The interface problem is also a governance problem."

— Gravitee State of AI Agents Report, 2026

The Gap

There's no purpose-built interface for humans to oversee AI agents at scale. Not dashboards. Not chat. Not alerts. A new category needs to exist.

7 / 12

What a Control Plane Actually Needs

Distilled from Microsoft, Anthropic research, and our own deployments.

Pre-Execution

📋

Plan Review

See what agent intends to do before it acts. Edit plans. Add constraints.

🎯

Scope Boundaries

Define allowed domains, tools, actions. Agent can't exceed boundaries.

🔗

Workflow Templates

Start from proven patterns. Don't reinvent for every task.

During Execution

👁️

Real-Time Visibility

See agent actions as they happen. Browser view. Code execution. API calls.

⏸️

Interrupt & Resume

Pause any agent instantly. Take control. Hand back.

🛡️

Action Guards

Automatic pause for high-stakes actions. Configurable thresholds.

Approval Layer

📥

Unified Queue

All pending approvals across all agents in one view.

🎛️

Batch Operations

Approve/reject patterns across many agents at once.

🔀

Smart Routing

Route different decisions to different humans by expertise.

Learning Layer

🧠

Decision Memory

Human approvals become future patterns. Rejections become rules.

📈

Threshold Tuning

Auto-adjust when to ask humans based on outcomes.

📚

Playbook Evolution

Workflows improve with every human intervention.

8 / 12

The "Control Plane" Category

Every complex system has a control plane. AI agents need one too.

🐕

Datadog

$50B+ market cap

Control plane for infrastructure. See what's happening. Alert when things break. Intervene.

☸️

Kubernetes

Industry standard

Control plane for containers. Orchestrate workloads. Handle failures. Scale automatically.

🔐

Okta

$15B+ market cap

Control plane for identity. Who can access what. Audit trails. Compliance.

🎛️

???

AI Agent Control Plane

What agents are doing. Approvals & intervention. Learning & guardrails. This category doesn't exist yet.

"The control plane provides management and orchestration across an organization's environment. It's akin to air traffic control for applications."

— Vectra AI definition

The Opportunity

Infrastructure got Datadog. Containers got Kubernetes. Identity got Okta. AI agents need their control plane. We're building it.

9 / 12

Why Human-in-the-Loop Scales

The VC objection—and why it's wrong.

The Objection

"If humans are in the loop, doesn't that kill unit economics? Isn't the whole point to remove humans?"

The Response: Look at the Data

Scale AI

$13.8B valuation

Human labelers + AI. Humans as oversight.

Pilot

$1.2B valuation

Human bookkeepers + AI. Humans as QA.

Palantir

$50B+ market cap

Human analysts + AI. Humans as strategists.

The Key Distinction

"Humans as OVERSIGHT, not labor. AI does the work, humans QA. The ratio improves over time."

The Scaling Math

1️⃣

Year 1: 10:1 ratio

1 human oversees 10 agents. Heavy QA.

2️⃣

Year 2: 100:1 ratio

System learns. Fewer interventions needed.

3️⃣

Year 3+: 1000:1 ratio

Humans handle edge cases only. Still critical.

The Avi Medical Case Study

81% automation rate. 93% cost savings. Humans handle complex cases. HITL doesn't kill unit economics—it enables them.

10 / 12

The Contrarian Bet

Everyone's zigging toward full autonomy. We're zagging toward control.

What Everyone Else is Building

🤖

Fully autonomous agents

Demo well. Break in production.

🤖

More agent capabilities

Better models. More tools. Same failure modes.

🤖

"Just add more agents"

17x error amplification, per DeepMind.

🤖

Removing humans entirely

The dream that keeps failing.

What We're Building

🎛️

The oversight layer

Makes ANY agent more reliable.

🎛️

Human-agent collaboration

Complementary strengths. Better outcomes.

🎛️

Coordination infrastructure

Turns bag-of-agents into functional team.

🎛️

Humans in the right places

Exception handling. Strategic oversight.

"I'm especially excited about products that use AI to make previously expensive services cheaper and more accessible, sometimes using human-in-the-loop to start."

— Keenan Saleh, a16z Speedrun Partner

Our Position

We're not betting against agent capabilities improving. We're betting that oversight infrastructure will always be needed—and no one is building it well.

11 / 12

Why Us, Why Now

Team

Keith Schacht

Co-Founder + Advisor

$140M exit (Mystery) · a16z funded · Built consumer products used by millions

[CTO]

Co-Founder & CTO

CTO of Because · $3M Seed · Deep agent infrastructure experience

Yasir

Co-Founder

yapthis.com · Agentic architecture · Shipped production agent systems

Why Now

📈

Agent adoption is exploding

OpenAI Operator, Anthropic Claude Code, 1000+ agent startups

💔

Failure rates are becoming visible

95% pilot failure is now common knowledge

📄

Research is converging

Microsoft, Anthropic, DeepMind all pointing to HITL

🏛️

Regulation is coming

EU AI Act mandates audit trails & oversight

What We've Built

✅

$4.5K MRR

Proving the thesis with real customers

✅

OpenClaw infrastructure

Dogfooding our own control plane daily

✅

Guardrails, ClawView, AgentGov

Components of the full control plane

12 / 12

The Ask

The Human-Agent Control Plane

Purpose-built infrastructure for human oversight of AI agents at scale. Plan review. Action guards. Approval queues. Learning loops. The missing layer that makes agents actually work.

What We Need

💰

$[X] Pre-Seed

Build the full control plane product

🎯

12-month goal: $1M ARR

Prove control plane scales across customers

📚

Then: Category definition

Be "Datadog for AI agents"

The Opportunity

📈

New category creation

No one owns "AI agent control plane" yet

📈

Research-backed thesis

Microsoft, Anthropic, DeepMind alignment

📈

Every agent deployment needs this

Horizontal opportunity across industries

🔧 Infrastructure We're Building

🛡️ Guardrails • 📊 ClawView • 🏛️ AgentGov • 🤖 Employee OS

The Control Plane integrates all infrastructure layers into one human-facing interface.

🔬 Research Foundation

MIT: 95% of AI pilots fail · DeepMind: 17x error amplification in multi-agent · Microsoft Magentic-UI: 71% accuracy improvement with HITL · Anthropic: "New oversight infrastructure needed" · Berkeley: "Why Do Multi-Agent Systems Fail?" · S&P Global: 42% of AI initiatives abandoned

1 / 12

VIBE CODING OUTCOMES

✨ OpenHolly

Vibe Code Your Business

"Vibe coding" revolutionized app development—describe what you want, AI builds it. Now apply this to business outcomes. Describe the result, AI + humans deliver it.

Feb 2025

Karpathy coins "vibe coding"

X/Twitter

2026

"Vibe productivity" emerges

Beyond just coding

71%

Accuracy boost with HITL

Microsoft Magentic-UI

$4K

MRR proving the thesis

2 / 12

The Vibe Coding Revolution

What started as a meme became a paradigm shift. Now it's evolving beyond code.

"There's a new kind of coding I call 'vibe coding', where you fully give in to the vibes, embrace exponentials, and forget that the code even exists."

— Andrej Karpathy, Feb 2025 (coined the term)

Origins & Evolution

2023

"The hottest new programming language is English"

Karpathy's early prediction about LLM capabilities

2025

Vibe coding goes mainstream

Cursor, Replit, Claude Code—describe → build

2026

Beyond coding: "Vibe Productivity"

Research, writing, reporting, file operations, "glue work"

Where It's Going

"What changed in early 2026 is that vibe coding is no longer confined to software development; it is spreading into research, writing, reporting, spreadsheet wrangling, file operations, and 'glue work' that usually fragments attention."

— Ken Huang, "The Vibe Shift" (Jan 2026)

The Pattern

Vibe coding showed that natural language → complex software works. Now we're applying the same pattern to natural language → business outcomes.

3 / 12

From Apps to Outcomes

The next evolution: describe what you want to achieve, not what you want built.

💻

Vibe Coding

"Build me an app that..."

→

✨
Vibe Outcomes
"Get me 50 sales meetings"

→

🎯

Result

Meetings on your calendar

❌ Current Reality: Use Tools

1

Subscribe to AI SDR tool

$5-10K/month

2

Configure the tool

Import lists, write sequences, set rules

3

Monitor the tool

Fix errors, adjust settings, babysit

4

Hope for outcomes

70% churn in 3 months when it doesn't work

✓ Vibe Outcomes: Describe Results

1

Describe what you want

"50 qualified sales meetings with Series A fintech founders"

2

AI agents execute

Research, outreach, qualification, scheduling

3

Humans QA

Review, approve, handle edge cases

4

Pay for outcomes

$X per meeting delivered

The Thesis

Vibe coding proved that intent → artifact works for software. Vibe outcomes proves it works for business results. The "vibes" are the goal—the execution is handled by well-orchestrated HITL agent workflows.

4 / 12

How It Works

Describe outcome → Agents execute → Humans QA → Outcome delivered

💬

Natural Language

"I need..."

→

📋

Workflow Generation

Map to playbook

→

🤖

Agent Execution

Multi-agent work

→

👤
Human QA
Review & approve

→

✅

Outcome

Delivered

Example: "50 Sales Meetings"

1

Input

"Book 50 qualified meetings with Series A fintech founders in Q1"

2

Research Agent

Identifies prospects, signals, contact info

3

Outreach Agent

Drafts personalized messages

4

Human Review

Approves messaging before send

5

Scheduling Agent

Books the meeting when prospect replies

Example: "Process These Invoices"

1

Input

"Process this month's invoices and flag anomalies"

2

Extraction Agent

Pulls data from PDFs, emails, systems

3

Matching Agent

Matches to POs, identifies discrepancies

4

Human Review

Approves exceptions, flags fraud

5

Output

Processed invoices, exception report

5 / 12

Why Vibe Outcomes Need Human-in-the-Loop

Pure AI can't deliver reliable business outcomes. The research is clear.

95%

AI pilots fail to deliver ROI

MIT NANDA Study

30%

Lower success when agents collaborate

CooperBench, 2026

38.9%

Cite accuracy as #1 AI challenge

Industry analysts, 2025

71%

Accuracy improvement with HITL

Microsoft Magentic-UI

Why Pure AI Fails

"Multi-agent architectures, despite their promise, can fall short on efficiency, reliability, and even accuracy... performance often degrades as coordination complexity increases."

— Berkeley/DeepMind, 2025

⚠️

Hallucinations occur even with high confidence

AI can be confidently wrong about business-critical decisions

⚠️

Edge cases are infinite

Business has nuance AI can't anticipate

⚠️

Stakes are high

Brand damage, legal liability, lost deals

Why HITL Fixes It

"Hybrid AI workflows, which combine automation with human oversight, are not a fallback; they're the modern standard for reliability, trust, and scalability in 2026."

— Parseur, Dec 2025

✓

Human as QA layer, not labor

AI does 90% of work, humans verify critical decisions

✓

Trust calibration over time

System learns when to ask, when to proceed

✓

Only 10% of tasks need human help

Microsoft found lightweight intervention = massive improvement

6 / 12

The Interaction Layer

This is the UX for the AI-native agency, control plane, and marketplace pitches.

Why Current Interfaces Fail

❌ Chat Interfaces

Conversational, not outcome-oriented. Can't manage complex multi-step workflows. No approval queues.

❌ Dashboards

Read-only visibility. No intervention. See problems after they happen. Can't modify plans mid-execution.

❌ Slack/Email Alerts

Ad hoc. No context. Alert fatigue. Can't see what agent plans to do next.

The Vibe Outcomes Interface

💬

Natural language input

"I need X" → system figures out how

📋

Progress visibility

See what's happening toward your goal

🎛️

Approval queues

Review decisions that matter

⏸️

Interrupt & adjust

Course-correct mid-execution

📊

Outcome tracking

Clear metrics: delivered vs requested

🔗 This Powers Our Other Pitches

⚡ Fat Startup: Vibe outcomes is how customers interact with us
🚗 Uber for AI Work: Natural language bounty posting
🎛️ Control Plane: The human oversight layer
☁️ AWS of AI Work: Workflow templates activated by intent

7 / 12

Market Opportunity

The shift from "tools" to "outcomes" is creating massive new markets.

$52.6B

AI Agents Market by 2030

MarketsandMarkets

30%+

Enterprise SaaS with outcome-based pricing

Gartner 2025 Projection

61%

CFOs changing how they evaluate AI ROI

Industry Survey, 2025

$1.5T

Global professional services (TAM)

Work that can be "vibe coded"

Who Wants This

🏢

SMBs frustrated with AI tools

70% AI SDR churn = customers seeking alternatives

🏢

Enterprises with AI fatigue

95% pilot failure = demand for what works

🏢

Founders too busy to manage AI

Want outcomes, not another tool to learn

The Pricing Shift

"Per-seat is no longer the atomic unit of software. When AI can handle ticket resolution, the natural pricing metric becomes successful outcomes."

— a16z Enterprise Newsletter, Dec 2024

💰

Outcome-aligned pricing

$X per meeting, $Y per processed invoice, $Z per video

8 / 12

Competitive Landscape

Who else is thinking about natural language → outcomes?

Tools (Not Outcomes)

AI SDRs (11x, Artisan, AiSDR)

Sell tools. Charge per seat. You manage agents. 70% churn.

❌ Not outcome-based

Agent Platforms (LangChain, CrewAI)

Infrastructure for developers. Build your own workflows.

❌ Not outcomes, just primitives

Automation (Zapier, Make)

Workflow automation. You design the flows.

❌ Not AI-native, not outcome-based

Closest Parallels

Scale AI ($13.8B)

Services + HITL → platform. "We need labeled data" → delivered.

✓ Outcome-based, HITL model

Pilot ($1.2B)

"Do my bookkeeping" → done. Humans + AI.

✓ Outcome-based, HITL model

Intercom Fin ($0.99/resolution)

AI support priced per successful outcome.

✓ Outcome-based pricing model

Our Differentiation

Horizontal, not vertical. Scale AI = data labeling. Pilot = bookkeeping. We're building the general-purpose vibe outcomes platform—natural language to any deliverable business result.

9 / 12

Current Traction

Proving the thesis with real customers and real outcomes.

$4K

MRR

5

Customers

0%

Churn

3

Outcome Types

Outcomes We've Delivered

📅

"Get me sales meetings"

SDR/BDR for construction, startups (50% of revenue)

🎬

"Generate training videos"

ML training data pipelines (30% of revenue)

📚

"Research these topics"

University lab literature synthesis (20% of revenue)

Why Zero Churn

"When you only pay for outcomes, there's no reason to churn. We deliver meetings, they pay. We don't deliver, they don't pay. Aligned incentives = sticky customers."

vs. AI Tool Churn

AI SDRs charge $5-10K/mo whether or not they work. When they don't deliver, customers leave. Misaligned incentives = 70% churn.

10 / 12

Why Now: 2026 Is the Year

Technology, market, and cultural convergence make this the moment.

Technology Ready

🧠

Models finally capable enough

GPT-5, Claude 4 can execute real business workflows

🔧

Agent infrastructure exists

OpenClaw, MCP, tool-use protocols

💳

x402 machine payments

Agents can transact autonomously (a16z Big Ideas 2026)

📊

HITL research converging

Microsoft, Anthropic, DeepMind all pointing same direction

Market Ready

💔

AI tool fatigue

70% AI SDR churn. 95% pilot failure. Customers want what works.

💰

Budget exists

Companies spending billions on AI, getting nothing

📈

Pricing shift happening

30%+ enterprise SaaS moving to outcome-based

🎯

"Vibe coding" cultural moment

Natural language → results is now understood

"2025 was widely labeled 'the year of AI agents.' In reality, it was the year we learned what agents can and cannot do. 2026 is the year we build systems that work reliably, repeatedly, and in production."

— Human-in-the-Loop Newsletter, Dec 2025

11 / 12

Team

Keith Schacht

Co-Founder + Advisor

$140M exit (Mystery) · a16z funded · Built consumer products used by millions

[CTO]

Co-Founder & CTO

CTO of Because · $3M Seed · Deep agent infrastructure experience

Yasir

Co-Founder

yapthis.com · Agentic architecture · Shipped production agent systems

What We've Built

🐕

Dog-fooding daily

Running OpenClaw infrastructure ourselves

🛡️

Agent Seatbelt

Browser-layer guardrails

📊

ClawView

Agent observability

📚

Workflow templates

Playbooks that compound

Why Us

✓

We've shipped outcomes

$4K MRR from real deliverables

✓

We understand HITL

Built the infrastructure, not just the agents

✓

We know the failure modes

Encoded in playbooks from real experience

12 / 12

The Ask

Vibe Code Your Business

Describe the outcome you want. AI agents + human QA deliver it. Pay only for results. The interaction layer for the AI economy.

What We Need

💰

$[X] Pre-Seed

Scale agent capacity + build the interface

🎯

12-month goal: $1M ARR

Prove vibe outcomes across multiple verticals

📦

Then: Self-serve platform

Anyone can describe outcomes and get them

The Opportunity

📈

New category creation

"Vibe outcomes" platform doesn't exist yet

📈

Cultural moment

Vibe coding is mainstream—extend it to business

📈

$52.6B market by 2030

AI agents + outcome-based pricing converging

📚 Research Foundation

Karpathy: Coined "vibe coding" Feb 2025 · RAND: 80% AI project failure · Microsoft Magentic-UI: 71% accuracy improvement with HITL · CooperBench: 30% lower success in multi-agent without coordination · a16z: Outcome-based pricing shift · Gartner: 30%+ enterprise SaaS with outcome pricing by 2025 · Bessemer: AI Pricing Playbook (Feb 2026)

🔗 Related Pitches

⚡ Fat Startup • 💰 Outcome-Based • 🚗 Uber for AI Work • 🎛️ Control Plane

Vibe Coding Outcomes is the UX/interaction layer that powers all of these.

Research • NYC Target Companies

01

25 NYC Startups: R&D Opportunities

Series A-B companies ($13M-$160M raised) with specific research they could implement but haven't.

25

NYC Tech Startups

$850M+

Combined Funding

75+

Research Opportunities

02

🏦 Fintech / Finance AI

Rogo — $75M (Series B)

Building "Wall Street's first AI analyst" — LLMs for financial reasoning

R&D Opportunities:

Chain-of-Table reasoning — 40% more accurate on tabular financial data
FinGPT fine-tuning — Open-source financial LLM for domain reasoning
Toolformer for financial APIs — Teach LLMs to call Bloomberg/Reuters autonomously

Hook: "Your financial reasoning models could be 40% more accurate on tabular data with Chain-of-Table"

Farsight — $16M (Series A)

AI for finance — valuation models, deal analysis, Excel/PPT generation

R&D Opportunities:

SpreadsheetLLM — Microsoft's approach to better spreadsheet understanding
DocPrompting — Generate accurate documents with citations
Table-GPT — Unified table understanding and generation

Hook: "SpreadsheetLLM could cut your Excel generation errors by 30%"

Aiera — $25M (Series B)

GenAI for financial professionals — broker research, earnings calls, filings

R&D Opportunities:

LongLoRA — Process 10x longer earnings calls without quality loss
RAG-Fusion — Multiple query generation for better retrieval
Time-LLM — Repurpose LLMs for time series forecasting

Hook: "LongLoRA could let you process 10x longer earnings calls without quality loss"

Carbon Arc — $56M (Series A)

Marketplace for curated AI-ready datasets (Insights Exchange)

R&D Opportunities:

Data-Juicer — Open-source data quality toolkit for AI datasets
DataComp — Benchmark for dataset curation quality
Synthetic data detection — Verify dataset authenticity/quality

Hook: "DataComp benchmarking could become your quality certification"

03

🏥 HealthTech / BioTech

Ataraxis — $20M (Series A)

AI for cancer precision medicine — analyzes data to identify optimal treatments

R&D Opportunities:

CancerGPT — Few-shot learning for drug pair synergy prediction
DrugCLIP — Contrastive learning for drug-target interaction
Med-PaLM 2 — Google's medical LLM achieving expert-level performance

Hook: "CancerGPT's few-shot approach could expand your drug combination predictions 5x faster"

Inspiren — $35M (Series A)

AI + IoT for senior care — AUGi device for fall detection and patient monitoring

R&D Opportunities:

RT-DETR — Real-time detection faster than YOLO
Action recognition transformers — Video transformers for activity recognition
Privacy-preserving pose estimation — On-device processing without cloud

Hook: "RT-DETR could cut your fall detection latency by 40% while running entirely on-device"

Slingshot AI — $40M (Series A)

AI for mental health — "Ash" chatbot simulates therapist-like conversations

R&D Opportunities:

Constitutional AI for safety — Anthropic's approach to helpful + harmless
EmoBERTa — Emotion-aware language model fine-tuning
CBT dialogue systems — Structured therapeutic conversation flows

Hook: "Constitutional AI could reduce harmful responses by 80% while maintaining therapeutic value"

Camber — $30M (Series B)

Healthcare payment automation — streamlines insurance reimbursement

R&D Opportunities:

Medical coding LLMs — Auto-coding diagnosis/procedure codes
Claims denial prediction — ML to predict and prevent rejections
Donut/Pix2Struct — Document understanding for medical forms

Hook: "Medical coding LLMs could auto-fill 60% of your claims forms"

04

🛠️ Dev Tools / Infrastructure

Warp — $18M (Series A)

AI-powered payroll platform for multi-state compliance

R&D Opportunities:

Regulatory RAG — Retrieval over tax code databases
LayoutLMv3 — Extract state tax forms with 95% accuracy
Temporal reasoning — LLMs for date/deadline calculations

NetBox Labs — $35M (Series B)

Open-source network automation platform

R&D Opportunities:

LLM for network config — Auto-generate Cisco/Juniper configs from NL
Anomaly detection — Transformer-based time series for network telemetry
Vision → IaC — Convert network diagrams to code

Topline Pro — $27M (Series B)

AI marketing for home service businesses

R&D Opportunities:

Local SEO automation — LLM-generated location-specific content
Multi-modal review response — Personalized responses with images
Conversational scheduling — LLM-powered booking agents

05

💼 Sales / Marketing AI

Clay — $40M (Series B, $1.25B valuation)

AI for sales personalization — integrates 100+ data sources

R&D Opportunities:

Persona-based email generation — LLMs that adapt tone per recipient
Entity resolution at scale — Deduplication across data sources
Buyer intent prediction — Multi-signal ML for ready-to-buy leads

Hook: "Buyer intent prediction could 3x your users' reply rates"

Profound — $35M (Series B) ⭐ Existing Client

AI search optimization — helps brands appear in AI-generated responses

R&D Opportunities:

Retrieval optimization — Improve citation likelihood in RAG systems
AI visibility benchmarking — Measure brand presence across LLMs
Source authority scoring — How LLMs weight different sources

ShopMy — $77.5M (Series B)

Influencer commerce platform

R&D Opportunities:

CLIP-based product matching — Visual similar product discovery
Influencer-audience fit — ML for matching creators with brands
Shoppable video AI — Auto-detect and tag products in video

06

⚖️ Compliance / Legal AI

Norm AI — $48M (Series B)

AI for regulatory compliance — automates review of legal documents

R&D Opportunities:

Legal-BERT fine-tuning — Domain-specific transformer for legal text
Contract element extraction — NER for legal clauses
Regulatory change detection — Track and summarize regulation updates

Hebbia — $130M (Series B, $700M valuation)

Document AI — searches large document sets with citations

R&D Opportunities:

ColBERT v2 — Late interaction retrieval for better search
Self-RAG — LLM that self-reflects on retrieval quality (+25% accuracy)
Structured reasoning chains — Better citation generation

Hook: "Self-RAG could improve your citation accuracy by 25%"

07

🔒 Cybersecurity / 🌱 Climate / 🛒 Consumer

Zip Security — $13.5M

SMB cybersecurity

LLM threat intelligence
Automated SOC analyst
LLM phishing detection (+40% accuracy)

Chestnut Carbon — $160M

Reforestation + carbon credits

Satellite carbon estimation
Biodiversity monitoring (audio/visual)
ML credit verification

GDI — $20M+

Silicon anodes for EV batteries

Battery degradation prediction
Materials discovery with ML
CV defect detection (-40% QC cost)

Novig — $18M

P2P sports betting

LLM odds modeling
Market making algorithms
Fraud detection

David — $75M

High-protein nutrition bars

AI food formulation
Demand forecasting
Consumer preference modeling

Cents — $40M

Laundry/dry-cleaning SaaS

Demand forecasting
Route optimization
Image garment classification

08

🎯 Best Targets by Category

🔥 Highest Urgency (AI-Native)

Rogo — Financial reasoning is hard, need every edge
Hebbia — Document AI is competitive, Self-RAG matters
Aiera — Long context + time series = big opportunities
Slingshot AI — Safety is existential for mental health AI

💰 Big Companies With Resources

Clay ($1.25B val) — Can afford to experiment
Hebbia ($700M val) — Research-forward culture
Chestnut Carbon ($160M) — ML for verification is huge

🎯 Underserved Markets

Inspiren — Elder care + CV is niche
Cents — Laundry tech has zero AI competition
Topline Pro — Home services AI is wide open

⭐ Existing Relationship

Profound — Already a client, easy expansion

Outreach Template

Subject: Quick R&D idea for [Company] — [specific technique]

Hi [Name],

Congrats on [recent news/funding]. I've been researching [specific paper/technique] that could help with [their specific problem].

Quick version: [1-sentence benefit with number]

I put together a 2-page brief showing how this could work for [Company]. Want me to send it over?

Research • Positioning Analysis

01

R&D ≠ The Pain Point

The real market pain is downstream from R&D — it's about shipping AI to production.

80%

AI projects fail to reach production (RAND)

95%

GenAI pilots failing (MIT/Fortune 2025)

The gap isn't finding the right model. It's shipping AI to production.

02

The Skills Gap (Reddit Gold)

From r/MLQuestions — 688 upvotes, Nov 2025

What Candidates Know

Transformer architectures, attention mechanisms
Papers they've implemented (diffusion, GANs, LLMs)
Kaggle competitions, theoretical deep learning

What Companies Need

Deploy a model behind an API that doesn't fall over
Write data pipelines that process reliably
Debug why the model is slow/expensive in production
Build evals to know if the model is working

"I'll interview someone who can explain LoRA fine-tuning in detail but has never deployed anything beyond a Jupyter notebook."

— Startup co-founder hiring ML engineers

03

The Observability Gap (Your Opportunity)

From Cleanlab's survey of 95 teams with AI in production

<1/3

Teams satisfied with observability

63%

Plan to improve observability next year

70%

Rebuild AI stack every 3 months

Key Insight

Even among the 5% of companies that reach production, most remain early in maturity. They can't reliably know when their agents are right, wrong, or uncertain.

04

Reframing The Pitch

	❌ OLD: "AI R&D Engineer"	✅ NEW: "Production AI Engineer"
Vibes	Research, experimentation	Deployment, reliability
Perception	Nice-to-have	Need-to-have
Target	Teams with resources	Teams with stuck projects
Job-to-be-done	"Find the best model"	"Ship to production this month"

The Positioning Gap

Aemon = the optimization engine

You = the shipping engine

05

Target Customers (Not Research Teams)

🚀 Series A-C Startups with AI Features

Have small ML teams, can't hire fast enough
ML engineers cost $200-400k and are hard to find
Need someone who can actually deploy, not just research

Pain: "We have 3 AI features in Jira blocked for months"

🏢 Product Companies Adding AI

Non-ML companies adding AI features
Don't have ML expertise internally

Pain: "We want AI in our product but don't know where to start"

⚙️ Enterprise AI Platform Teams

Drowning in stack churn (rebuilding every 3 months)
Coordination overhead killing velocity

Pain: "Platform team of 5 supporting 20 feature teams — we're bottlenecked"

🏛️ Regulated Industries

42% plan to add oversight features (vs 16% unregulated)
Need governance + observability

Pain: "Can't deploy AI without compliance sign-off"

06

Better Pitch Angles

1. "Your AI Projects Are Stuck. We Ship Them."

Target: Companies with AI projects "in progress" for months
Proof: Show deployment timelines (weeks vs months)
Wedge: Audit → identify stuck projects → ship one fast

2. "AI Observability + Ops as a Service"

Target: Companies with AI in production but no visibility
Pain: "We don't know when our AI is wrong"
Proof: Catch regressions, reduce incidents

3. "The AI Platform Team You Can't Hire"

Target: Scaling startups without MLOps expertise
Pain: ML engineers cost $400k and don't want to do ops
Proof: Infrastructure setup in days, not months

4. "CI/CD for AI" (existing pitch)

Still good, but position as production not research
Focus on deployment gates, not model selection
"Every AI PR tested against your evals before merge"

Action Items

Rewrite pitches with "production" and "ship" language
Target stuck projects — companies with AI features in backlog
Lead with observability — 63% want better visibility
Offer quick wins — "Ship one AI feature in 2 weeks"
Avoid research teams — they don't have budget urgency

Research • AgentDocs Wedges

01

AgentDocs Wedges
& Approaches

Based on Garry Tan's YC video insight: agents pick tools based on doc quality, not actual performance. The Whisper/Groq problem.

Claude Code defaulted to Whisper V1 — a near-deprecated model — because it has better documentation than Groq, even though Groq is 200x faster and 10x cheaper.

— Garry Tan, YC Partner, Feb 2026

The Insight

Agents pick tools based on doc quality, not actual performance — and that's exactly the gap AgentDocs exploits.

02

Wedge Scoring (6 Dimensions)

Wedge	Mkt	Pain	Comp	Fit	x402	Time	Total
🥇 LLM / Model Routing	5	5	3	5	4	5	27/30
🥈 Video Gen	5	5	3	5	5	4	27/30
🥉 Audio / Transcription	3	5	2	4	4	5	23/30
Deployment / Hosting	5	4	4	4	3	4	24/30
Agent Identity (email/phone)	4	5	4	3	3	5	24/30
Databases	5	4	3	3	2	4	21/30
Image Gen	4	3	5	3	4	3	22/30

03

x402 Market Reality Check

What agents are ACTUALLY spending on today (x402scan.com, Feb 2026)

$101K

24h Volume

513

Active Merchants

692

Crypto/Onchain Servers

0

Transcription Services

What Exists (Validated)

✓

Crypto/Onchain

692 servers — dominant vertical

✓

AI Servers

486 servers — led by Virtuals ACP ($163K/day)

✓

Search/Data APIs

216 servers — StableEnrich, httpay

✓

Trading Intelligence

203 servers — alpha signals

What's Missing (Opportunity)

0

Transcription

Zero servers — Garry Tan example!

~1

Video/Image Gen

42 txns — essentially nothing

0

Deployment/Hosting

Nothing

0

Databases

Nothing

04

Re-scored: x402 Demand vs Fit

Wedge	x402 Now	Holly Fit	Verdict
Multi-API aggregation + capability layer	✓ 3 players, no AgentDocs	✓ Direct fit	Best immediate wedge
Agent-to-agent coordination	✓ $163K/day (Virtuals)	✓ Holly as orchestrator	Most validated demand
Social data for agents	✓ StableSocial live	✓ Fits Wurk agents	Niche but real
Transcription (Whisper/Groq)	❌ Zero on x402	✓ Strong routing layer	6–12 months early
Video gen	❌ Near zero	✓ Strong dogfood	12–18 months early

Key Insight

Absence of transcription/video/deployment on x402scan is opportunity signal, not rejection. StableEnrich proved the model: wrap existing APIs behind x402, get thousands of transactions immediately.

05

Recommended Launch Order

🎙️ 1. Transcription — NOW

Zero servers on x402. Garry Tan moment 6 days ago. First-mover window open.

AgentDocs value: Verified schema {input, model: "groq|deepgram|whisper", output}

Groq at $0.02/min → charge $0.03/min

"Your agent would have chosen Whisper V1. Ours chose Groq."

🎬 2. Video Gen — Dogfood Now

Parameter chaos problem (Kling uses cfg_guidance, Runway uses guidance_scale). Genuinely unsolved.

AgentDocs value: Agent sends {prompt, style, duration, budget}, Holly resolves params

Already dogfood — Holly generates video

🧠 3. LLM Routing — Big Vision

Agent says {task: "transcribe", latency: "fast"} → gets best provider with pricing + ready API call.

The purest AgentDocs wedge

📧 4. Agent Identity — HOT

Garry Tan: "Has anybody built Twilio for agents yet?"

Email + phone + wallet in one API call

Jared Friedman (YC): "Even the best dev tools don't let you sign up via API. This is a big miss in the claude code age — claude can't sign up on its own."

06

The Gap: Verified Snippet Services

3

Multi-API aggregators live

StableEnrich, httpay, LowPaymentFee

4.7K

Txns for StableEnrich

0

With capability contracts

0

With AgentDocs semantics

What They Do

Aggregate APIs (Apollo + Firecrawl + Grok + Serper) behind one x402 endpoint.

"Throw money at endpoint, get data back"

What They Don't Do

❌

Structured capability contracts

❌

Machine-readable reasoning

❌

Verification + benchmarks

The Unified Pitch

OpenHolly becomes the first x402-native capability registry for non-crypto agent needs — the "Stack Overflow for agents" that makes every new category agent-accessible from day one.

07

How Agents Discover Agent-First Platforms

For agents, "discovery" = machine-interpretable services, not human landing pages.

1. Protocol-level Discovery (x402 v2)

Services expose structured metadata (endpoints, pricing, chains). Facilitators crawl and index.

2. Facilitator/Registry Indexing

Layer of facilitators that index x402 services, maintain up-to-date pricing/metadata.

3. Agent-Centric Wallets

Coinbase agentic wallets pre-integrated with x402. Discovery APIs built-in.

4. Semantic Capability Registries

"Internet of Agents" research: agents announce capabilities in machine-interpretable form.

The agent doesn't "Google" a platform; it queries its facilitator ("find a market-data API with latency <100ms and price <0.5¢/request"), receives candidates with structured metadata, picks one, then talks HTTP+402 with it.

— Perplexity Research, Feb 2026

01

WORLD MODELS

🏭 Data Factory

Synthetic Data for
Controllable Video Models

Video models can generate stunning visuals but can't follow precise instructions. The bottleneck isn't compute or architecture — it's training data with exact state trajectories.

Data scaling plateaus at 200K-400K samples. The persistent ~15% gap between in-domain and out-of-domain performance isn't solvable with more data — it requires architectural changes AND structured training data.

— VBVR Paper (Wang et al.), Feb 2026

$5B+

Raised in world model space (2025-26)

20M hrs

NVIDIA Cosmos training data

50%

Model accuracy on physics (chance level)

97%

Human accuracy on same tasks

02

The Core Insight: Controllability Before Reasoning

Why Video Models Fail at Reasoning

1

Trained on natural video

Learned "everything moves together" — can't isolate changes

2

No state tracking

Can't represent "object A moved, B stayed" explicitly

3

Errors compound

Step 1 error → Step 2 error → reasoning chain breaks

What Data Factory Provides

✓

Exact state trajectories

Frame-by-frame ground truth of what changed

✓

Parameterized variations

Same action, different contexts — curriculum learning

✓

Physics-accurate simulation

Genesis/Isaac Sim backends for real dynamics

The Robot Arm Analogy

You can't teach a robot to cook if it knocks over the salt every time it reaches for the pepper. Same with video models: if they can't execute precise state transitions, chaining multi-step reasoning becomes impossible. Controllability is the prerequisite.

03

The "Data Factory" Architecture

📋

Domain Spec

"warehouse picking"

→

⚙️

Scene Generator

Parameterized templates

→

🎮

Physics Sim

Genesis / Isaac Sim

→

🎬

Video + Labels

State trajectories

→

📦

Training Dataset

LoRA-ready format

🏭 Vertical Templates

Pre-built scene generators for:

Warehouse robotics
Surgical verification
Manufacturing QA
Autonomous driving

⚡ Scale Economics

Genesis claims:

430,000x

faster than real-time simulation

🎯 LoRA-Ready Output

Direct fine-tuning:

Wan2.1 / Wan2.2 compatible
Rank 32 = startup compute
~$5K for domain model

04

Why This Wins

📊 The Moat

The "data factory" — parameterized generators + distributed workers — is the real competitive advantage. No productized version exists for vertical industries.

Network effect: Each vertical adds templates → attracts more customers → funds more verticals

💰 Business Model

Per-video pricing: $0.01-0.10 per synthetic clip
Dataset packages: $5K-50K per domain
Enterprise: Dedicated capacity + custom templates

Gross margins >80% (compute is cheap vs. real data collection)

Timing

Genesis open-sourced Dec 2024. NVIDIA Cosmos launched Jan 2025. π0 open-sourced Feb 2025. The infrastructure just became available — but nobody has built the vertical data factory layer yet.

05

Competitive Landscape

Company	Focus	Gap
NVIDIA Cosmos	Foundation models	Not vertical-specific data
Genesis AI	Physics engine	No data pipeline layer
Physical Intelligence	Robot foundation model	Consumes data, doesn't sell it
Scale AI	Data labeling	Labels real data, doesn't generate
Data Factory (Us)	Synthetic video data	Full vertical pipeline ✓

The dirty secret of robotics AI is that real-world data collection costs $100-1000/hour when you include robot time, human supervision, and failure recovery. Synthetic data at $0.01/clip changes the economics completely.

— Industry estimate

01

WORLD MODELS

📏 Eval Platform

Benchmarking Infrastructure
for World Models

VLM-as-a-judge is expensive and non-reproducible. IntPhys shows models at chance level. Everyone's flying blind on what their world models actually understand.

Most models perform at chance levels (50%), in stark contrast to human performance, which achieves near-perfect accuracy (97%+). Current video understanding benchmarks do not capture intuitive physics.

— IntPhys 2 (Meta FAIR), Jun 2025

50%

Best model accuracy (chance level)

97%

Human accuracy (same test)

47 pts

Gap to close

$5B+

Raised without rigorous eval

02

The Evaluation Crisis

Current State: Flying Blind

1

VLM-as-a-judge

Expensive ($0.10-1.00/sample), non-reproducible, biased

2

Demo-driven claims

Cherry-picked videos, no systematic testing

3

Benchmarks don't transfer

Academic benchmarks ≠ production reliability

What's Needed

✓

Deterministic scoring

Rule-based, reproducible, cheap to run

✓

Human-correlated metrics

VBVR-Bench achieves ρ > 0.9 with human judgment

✓

Domain-specific suites

Robotics, driving, medical — each needs own benchmarks

The VBVR Breakthrough

VBVR-Bench demonstrates that rule-based evaluation can match human judgment (ρ > 0.9 correlation). But it's research code, not a product. Domain-specific versions don't exist.

03

Product: Eval-as-a-Service

🎬

Upload Video

Model output

→

🔬

Benchmark Suite

Physics / Control / Reasoning

→

📊

Detailed Report

Scores + failure modes

→

📈

Leaderboard

Public or private

🧪 Benchmark Suites

Physics: Object permanence, gravity, collisions
Control: Instruction following, state isolation
Reasoning: Multi-step causal chains
Domain: Robotics, driving, medical

💰 Pricing

API: $0.01/video (basic)
Full suite: $0.10/video
Enterprise: Unlimited + custom
Leaderboard: Free tier for visibility

🎯 Target Customers

Runway, World Labs, DeepMind
Physical Intelligence, Wayve
Robot startups building on π0
Enterprise adopting video AI

04

Why This Works

📊 Market Dynamics

$5B+ has been raised in world models with no standardized evaluation. Every company is building their own benchmarks internally. That's waste.

Comparable: ML evaluation market ~$500M (2024), growing 25%+ YoY

🔄 Network Effects

Leaderboard: Models compete → drives adoption
Benchmark contributions: Companies add domain tests
Data flywheel: More evals → better calibration

The gap between demo videos and production reliability is massive. Objects disappear, physics drifts, game logic is brittle over longer sessions. We need systematic evaluation, not cherry-picked demos.

— GradientFlow Analysis, 2026

05

Roadmap

Q2

Launch Core Benchmarks

IntPhys-style physics, VBVR-style controllability, basic API

Q3

Domain Expansion

Robotics suite (π0 compatible), driving suite (Wayve/Comma style)

Q4

Public Leaderboard

Like HuggingFace but for world models. Attract submissions, build community

2027

Enterprise + Certification

"World Model certified for X domain" — becomes industry standard

01

WORLD MODELS

🤖 Embodied AI Infra

Training Platform for
Robot Foundation Models

Training video/world models costs 10-100x more than LLMs. The infrastructure layer is missing. We build the "AWS for embodied AI."

Embodied AI training requires tight integration of simulation, rendering, and ML. Current cloud offerings are designed for LLMs. The infrastructure gap is massive.

— Industry observation

$1B

World Labs raise (Feb 2026)

$600M

Physical Intelligence raise

$315M

Runway raise (Feb 2026)

10-100x

Video vs LLM compute cost

02

The Infrastructure Gap

What LLM Infra Provides

✓

GPU clusters

H100s, A100s, optimized networking

✓

Training frameworks

PyTorch, JAX, distributed training

✓

Data pipelines

Text ingestion, tokenization, streaming

What Embodied AI Needs (Missing)

✗

Integrated simulation

Physics engine + renderer + ML in one loop

✗

Video data pipelines

Frame extraction, state annotation, streaming

✗

Sim-to-real transfer

Domain randomization, reality gap tools

Why This Matters Now

π0 just open-sourced. Genesis just launched. Cosmos is available. The building blocks exist but nobody has assembled them into a platform. Every robotics startup is duct-taping their own stack.

03

Platform Architecture

🎮 Simulation Layer

Managed Genesis/Isaac Sim instances

One-click deployment
Auto-scaling workers
Pre-built environments
430,000x faster than real-time

📊 Data Layer

Video + state trajectory storage

Frame-level annotations
Streaming to training
Version control for datasets
Curriculum management

🧠 Training Layer

Optimized for video models

Pre-configured for π0, Cosmos
LoRA fine-tuning pipelines
Distributed video training
Eval integration built-in

🎯

Define Task

→

🎮

Simulate

→

📦

Generate Data

→

🧠

Train Model

→

📏

Evaluate

→

🤖

Deploy

04

Market Opportunity

📈 TAM Analysis

World model companies (funded)	$5B+ raised
Robot startups (π0 ecosystem)	100+ companies
AV companies (simulation needs)	50+ companies
Enterprise robotics adoption	Growing 30%+ YoY

Conservative estimate: $2B addressable market for embodied AI infrastructure by 2028

💰 Business Model

Compute: GPU-hours (sim + training)
Storage: Video dataset hosting
Platform: Monthly SaaS for orchestration
Enterprise: Dedicated clusters + support

Target: 40-60% gross margins (better than pure GPU cloud)

05

Competitive Position

Player	Sim	Data	Train	Eval	Deploy
CoreWeave/Lambda	✗	✗	✓	✗	✗
NVIDIA Omniverse	✓	~	✗	✗	✗
Genesis (OSS)	✓	✗	✗	✗	✗
Weights & Biases	✗	~	~	✓	✗
Us (Full Stack)	✓	✓	✓	✓	✓

The Integration Thesis

Embodied AI requires tight coupling between simulation, data, and training. Point solutions create friction. An integrated platform captures the full workflow — and the full margin.

01

Mission Control

Kubernetes for
AI Coding Agents

Deploy, coordinate, and govern fleets of Claude Code, Cursor, and Codex—so your team ships 10x faster with verification.

95%

of enterprise AI pilots fail to deliver ROI

We fix that.

02

The $40B Problem

Enterprise spent $30-40B on AI pilots. Most failed—not because the models are bad, but because nobody built the infrastructure to run them safely.

95%

AI Pilots Fail

MIT study: lack of context, poor verification, no adaptation. The agent isn't the problem—the harness is.

70%

Devin Task Failure

Answer.AI study: only 3/20 tasks completed. Best autonomous agent still needs orchestration.

$200-400

Real Cost of "$20" Plans

Cursor's hidden API fees surprise users. One team spent $8,000 on a "$200" plan.

0

Cross-Session Learning

No coding agent learns from failures. Same mistakes repeat every session. Zero organizational memory.

"The bottleneck is now having multiple agents at once."

— r/GithubCopilot, 2026

03

Four Pillars of Agent Fleet Management

Every failed AI pilot breaks down on one of these. Mission Control solves all four.

🎯

Task Alignment

Does the agent understand what you actually want? Intent verification, scope control, semantic diff between ask and interpretation.

✓

Verifiability

How do you know it's right? Automated pipelines: tests, security, quality gates. Verified before it touches production.

⚡

Ability

Can the agent actually do this? Route tasks to best-fit agent. Claude Code for reasoning, Cursor for flow, Codex for CI/CD.

🧠

Adaptability

Does the system learn? Auto-generate rules from corrections. One person's fix helps the whole team.

The Insight

The agent is commodity. The harness is moat. Everyone's building better agents—nobody's building the infrastructure to run 50 of them safely on a production codebase. We are.

04

How Mission Control Works

From chaos to coordination in four steps.

1

Task Decomposition

Complex request → atomic, verifiable steps with dependency graph

2

Smart Routing

Route each subtask to best-fit agent based on capability profiling

3

Parallel Execution

Multiple agents work simultaneously, conflicts prevented automatically

4

Verify & Learn

Automated verification, corrections become team-wide rules

Without Mission Control

❌

Manual agent juggling

5 tabs, 5 agents, merge conflicts everywhere
❌

Review everything

45+ minutes per agent PR, AI slop review fatigue
❌

Same mistakes repeat

No learning between sessions or team members
❌

Surprise bills

$20/mo → $400/mo actual spend

With Mission Control

✅

Unified dashboard

All agents in one view, automatic coordination
✅

Pre-verified PRs

12-minute reviews, confidence scores, semantic diffs
✅

Continuous learning

Auto-generate .cursorrules, team-wide improvement
✅

Predictable costs

Route to cheapest capable agent, budget alerts

05

Why Now, Why Us

Market Timing

🚀

Multi-agent is inevitable

VS Code 1.109 enables multi-agent dev. GitHub Copilot + Claude + Codex side-by-side. No one owns orchestration.
📈

YC is 50% AI agents

25% of YC companies have almost entirely AI-generated codebases. They need this.
🏆

Models peaked, harness didn't

Claude Code at 72.5% SWE-bench—best in class. The gap is now coordination, not capability.

Competitive Landscape

Tool	Gap
Langfuse	Observes, doesn't orchestrate
LangSmith	LangChain lock-in, no self-host
Cursor/Devin	Single-agent, no coordination
Linear/Jira	Track humans, not agents
CrewAI	Framework-specific, not code-native

We're agent-agnostic, code-native, verification-first.

The Moat

Three things compound: (1) Routing models improve with every task—network effects. (2) CI/CD integration is sticky—once you're in, you stay. (3) Verification layer builds trust—that's years of R&D, not a feature.

01

RECRUITMENT

👥 Recruitment Autopilot

10,000 Roles Go Unfilled Every Day.
We Fill Them While You Sleep.

The AI recruiting agency that takes roles from intake to offer — no humans in the loop until the interview.

$200B+

Global Staffing TAM

80%

Applications Never Get a Response

12 days

Avg Time-to-Fill → We Do 3

150%

Annual Hourly Worker Turnover

02

The Problem

"Every time I log into LinkedIn Recruiter I feel like I'm being mugged by Microsoft with a smile. The damn thing is $10k+ a year and for what?"

— r/recruiting

💸

$50-100K/year across 3-4 tools
LinkedIn Recruiter, Greenhouse, Gem, SeekOut — and still doing manual work

🕳️

80% of candidates hit a black hole
"Greenhouse is a resume black hole I never hear back from"

📉

InMail response rates at all-time lows
"1 in 5 responds if lucky. Most devs auto-ignore us."

🔍

Search is fundamentally broken
"You want a backend engineer with Python? Here are 300 customer success managers and a dentist."

👻

44% of hires already in your ATS
Silver medalists rot while you pay to source new candidates

⏱️

70-80% of recruiter time is automatable
10-15 hours per hire spent on tasks AI can do today

03

The Opportunity

Why Now? LLMs crossed the threshold. GPT-4+ can do natural language search, personalized outreach, and conversational screening that doesn't feel robotic. Meanwhile, LinkedIn fatigue is peaking — 43% of recruiters actively seeking alternatives (up from 26% in 2024).

Market Size

📊

$200B+ TAM — Global staffing market

🏭

80% is hourly roles — Warehouse, retail, logistics

📈

100-150% annual turnover — Constant hiring demand

Competitive Landscape

🦄

Mercor ($10B) — Proved model, pivoted to AI training

🎯

Juicebox ($200M) — PLG sourcing, not full-funnel

🏢

Paradox ($1.5B) — Enterprise only ($25K+)

The Gap: No one owns full-funnel automation for the SMB/mid-market. Enterprise tools are too expensive. Point solutions don't talk to each other. The hourly hiring market is wide open.

04

The Solution

One AI agent that owns the entire candidate journey — source to offer.

1️⃣

INTAKE (5 mins)
Natural language role description → AI generates ideal candidate profile + search plan

2️⃣

SOURCE (Autonomous)
AI searches 30+ sources, scores/ranks candidates, resurfaces silver medalists from ATS

3️⃣

OUTREACH (Autonomous)
Multi-channel sequences (SMS, email, WhatsApp), handles replies and FAQs automatically

4️⃣

SCREEN (Semi-Autonomous)
AI conducts phone/video screens overnight, scores against rubric, surfaces interview-ready candidates

5️⃣

SCHEDULE (Autonomous)
Coordinates across calendars, handles reschedules and no-shows, sends prep materials

6️⃣

CLOSE (AI-Assisted)
Comp benchmarks, offer letter generation, acceptance tracking

"We screen 50 candidates overnight. Recruiters wake up to 8 interview-ready."

05

GTM & Traction

Phase 1: Vertical Wedge

🎯

Target: Regional staffing agencies (10-50 recruiters) in warehouse/logistics

💰

Budget: $25K-50K/year in tooling spend

🤝

Offer: "Fill 20% of roles in 60 days or free"

Pricing Model

📋

$199/mo base — Platform access

✅

$500/successful hire — Aligned incentives

Target Metrics (Demo Day)

3 days

Time-to-fill (vs 12 baseline)

60%

Lower cost-per-hire

3x

Higher response rate than InMail

100 reqs

Per recruiter (vs 15)

Traction Narrative: "5 staffing agencies. 47 roles filled. Average time-to-fill: 4 days (baseline was 14). They're paying $500/hire and saving $2K vs previous process. $100K ARR run rate."

06

The Ask

$2M Seed

Expand to 50 agencies by end of year

Use of Funds

👥

Hiring: 3 engineers, 2 sales

🔧

Product: AI phone screen, multi-channel outreach

📈

GTM: ASA events, content, case studies

12-Month Milestones

🎯

50 agency customers

💵

$1M ARR

📊

10,000+ roles filled

🏆

Series A ready at $150-200M

"We're the AI agency for hourly hiring. $200B market, no full-stack autopilot exists. We built it."

01

CONSULTING

📊 Consulting Autopilot

The 70% Problem

Consulting is 70% intelligence work (research, analysis, modeling) and 30% judgment work. We automate the 70%—the nights and weekends work—so consultants can focus on what actually matters.

$300B+

Global Consulting TAM

60-90%

Time Spent on Manual Research

27%

Work Immediately Automatable

$116B

AI Consulting Market by 2035

02

PROBLEM

McKinsey Analysts Shouldn't Be PowerPoint Jockeys

"At McKinsey, we spent over 90% of our time on manual work—reading reports, building Excel models, creating presentations." — Grasp Founders

10+

Tools Per Project (copy-paste hell)

50%

Slide Time Spent on Formatting

$50K+

AlphaSense Cost Per Seat

"AI generates generic fluff, not MBB-quality output. Copilot can't do action titles. No tool understands Pyramid Principle or MECE structures."

03

INSIGHT

Sell TO Consultants, Not Replace Them

Operand says "AI to Kill McKinsey." We say: give McKinsey analysts superpowers. The winning model is augmentation—Grasp has 200 customers and 3.5x ARR growth proving it.

Grasp

$9M raised, 200 customers, augmentation

Perceptis

"Superpowers, not replacement"

2-3x

More Proposals, 40-70% Win Lift

The Killer Gap: Verifiability. Consultants don't trust AI outputs. Every claim needs traceable sources, explicit assumptions, confidence intervals. No one does this yet.

04

SOLUTION

AI Analysts for Consulting Firms

Multi-agent system that turns days of research into client-ready deliverables in hours. Research → Analysis → Deliverable—one unified workflow with human-in-the-loop verification.

Research Agent

20+ sources, synthesized, every claim cited

Analysis Agent

Models, scenarios, visualizations

Deliverable Agent

MBB-quality decks, Pyramid Principle

Verification Layer

Source links, confidence scores, audit trail

Why Now: Multi-modal AI finally good enough. Reasoning models enable complex analysis. Consulting under cost pressure. Talent arbitrage—ex-MBB available to train AI and verify outputs.

05

GTM

Start With Boutiques, Not Big Four

10-50 person consulting shops. Ex-MBB founders who know what "good" looks like. Fast decision cycles, desperate for competitive advantage, can't build in-house AI.

10-50

Person Firms (Sweet Spot)

$2-50M

Revenue Range

$200-500

Per Seat/Month Target

SOC-2

Early (Enterprise Must-Have)

Expansion Path: Boutiques → In-house corporate strategy teams → PE portfolio companies → Big Four individual teams → Enterprise. Grasp already serves "most of the Big Four" after starting narrow.

06

ASK

Building the AI Operating System for Consulting

We're Cursor for consultants—AI that actually understands strategy, not just generates slides. Targeting $2M seed to build the verifiable, consulting-grade AI workflow that doesn't exist yet.

80%

Research Time Reduction Target

3.5x

ARR Growth (Grasp Benchmark)

10x

Cheaper Than AlphaSense

$100B+

US Gov Consulting Spend (YC RFS)

One-Liner: "We're building AI analysts for consulting firms. Our multi-agent system turns days of research into client-ready deliverables in hours—with every claim verifiable and humans always in control."

01

Vertical Deep Dive • Supply Chain & Procurement

📦 OpenHolly

Autonomous Procurement Agents
That Replace $180B in Human Labor

We're building AI agents that do the work of procurement BPOs at 1/20th the cost. Pay-per-outcome, not seats.

18:1 Labor vs Software Spend 90% CPOs Exploring Agents Incumbents at 1.2/5 ⭐

Enterprises spend over $180 billion annually on procurement talent, compared to roughly $10 billion on procurement software, reflecting how much work still happens manually around existing systems.

— Lio PR / Industry Analysis

02

The 18:1 Arbitrage

When companies spend 18x more on people than software, the work is clearly being done by humans. That's our opening.

$180B

Annual spend on procurement talent

Human labor doing manual work

$10B

Annual spend on procurement software

Tools that don't actually do the work

The Insight

Software hasn't automated procurement — it's just digitized paperwork. The work is still done by humans. AI agents can now do that work.

90%

CPOs considering AI agents

ProcureCon 2025

$5.8B

Procurement BPO market

Already outsourced = proven spend

2%

Spend leaks annually

McKinsey — unfulfilled obligations

58%

Struggle to find talent

CIPS 2025

03

The Incumbent Disaster

SAP Ariba is the market leader. Users hate it. This is our opening.

⭐ 1.2 / 5 on Trustpilot

SAP Ariba — 98 reviews, near-universal hate

"This software makes me wanna quit my job. This should not exist."

— SAP Ariba User, Trustpilot

"Logs me out 5720937 times a day... like software from 1980"

— SAP Ariba User, Trustpilot

"If you are a supplier, THIS WILL HURT YOUR BUSINESS."

— SAP Ariba User, Trustpilot

Why Incumbents Can't Fix This

🐌

Legacy architecture

Built pre-cloud, can't rebuild without breaking everything

💸

Misaligned incentives

Revenue from supplier fees, not buyer value

🔧

IT-heavy configuration

6-12 month implementations, constant IT involvement

🤷

Support is non-existent

"Tell me it's not their department"

What Buyers Actually Want

Fast implementation (days, not months)
Consumer-grade UX
Pay for outcomes, not seats
Measurable ROI from day one

04

GTM Playbooks That Work

The winners in this space have proven wedges and go-to-market strategies we can learn from.

🎯 Zip — The Cold DM Strategy

Before building anything, founders DMed hundreds of procurement managers on LinkedIn — not to sell, but to learn.

Avoided selling to anyone they knew (tested true PMF with strangers)
Customers "ready to buy before the demo was even complete"
4 customers in 6 months of product development

Why It Worked

Built for a "boring, massive" market with "sleepy incumbents" and "low NPS"

💰 Magentic — Pay-Per-Cure

Zero upfront cost. "You never pay for seats or shelfware." Implementation in days, not months.

If no savings, no payment — complete risk reversal
Found 25% of supplier docs had errors impacting P&L
$10-20M P&L impact per customer

Sequoia Thesis

"In the old world, SaaS sold the promise of ROI. In the new world, AI actually delivers it."

🤖 Pactum — Tail Spend Wedge

Started with "long tail" suppliers enterprises never negotiate with — low risk, high volume.

Walmart pilot: 89 suppliers, 5 buyers, 3 months
$25K proof-of-concept pricing
68% supplier acceptance, 3% avg savings
75% preferred negotiating with bot over human

👥 Lio — BPO Displacement

Positioned as replacing outsourcing, not software. Competes against BPOs (20x cost of software).

75% of outsourced work automated in 6 months
95%+ adoption rate, 100% retention
10 FTE freed per customer
German enterprise entry (Munich Re, Schaeffler)

05

Where We Win

The four-pillars framework shows exactly where agentic AI outperforms existing tools.

Area	Legacy Gap	Agentic Opportunity
Intake & Approval	Email chains, manual routing	Autonomous triage and routing
Negotiation	Human-only, can't scale	AI negotiation at scale (2000 suppliers simultaneously)
Contract Compliance	Periodic audits, 2% value leakage	Continuous monitoring, proactive alerts
Supplier Management	Reactive, manual tracking	Proactive risk sensing, autonomous action
Invoice Processing	OCR + human review	End-to-end matching and payment

White Space We Target

🏭

Mid-market

Coupa too expensive, Ariba too painful

📄

Supplier compliance

Wide open — Magentic early stage

🇺🇸

US industrial

Tacto owns Germany, US gap

🏥

Vertical-specific

Healthcare, construction procurement

Our Entry Wedge

Managed Procurement Ops

Like our SDR offering, but for procurement. We operate the agents, customers get outcomes.

Start with supplier compliance (Magentic model)
Pay-per-savings pricing
Expand to negotiation, sourcing

06

Why Now

🤖 Capability Threshold

LLMs can now read contracts, negotiate, and execute end-to-end. Pactum proved it at Walmart scale.

🌍 Supply Chain Polycrisis

COVID, Ukraine, tariffs created urgency. "The old answers—another dashboard or SaaS tool—are spent."

👥 Talent Shortage

58% struggle to find/retain procurement talent. BPO arbitrage is ending as AI gets cheaper than offshore labor.

"We're entering a phase in the enterprise where AI moves beyond workflow co-pilots to autonomous, multi-agent execution."

— Seema Amble, a16z

Investment Thesis

$180B of human labor waiting to be automated.

Incumbents at 1.2/5 stars. 90% of CPOs exploring agents.

✅ High intelligence-to-judgment ratio • ✅ $5.8B BPO proves willingness to pay • ✅ Measurable P&L impact • ✅ Sequoia + a16z conviction

HIGHLY ATTRACTIVE vertical for AI autopilot investment

01

Vertical • Insurance Brokerage

🏢 Insurance Brokerage

Insurance Is 95% Coordination,
5% Judgment.

39,000 agencies run on email and spreadsheets. We're building the AI that replaces their back office — delivering outcomes, not tools.

$140-200B TAM YC RFS Vertical 3 Unicorns Emerging

Getting one of these businesses insured takes ~50 steps over two weeks. The broker's actual judgment matters for maybe 5% of the process. The other 95% is pure coordination.

— Panta (YC W26)

02

YC & VCs Are All-In on This Vertical

Sequoia, Emergence, Khosla, and YC are funding AI-native brokerages. YC's 2026 RFS explicitly calls out "AI-Native Agencies."

"Agencies of the future will look more like software companies, with software margins. And they'll scale far bigger than any agencies that exist in these fragmented markets today."

— YC Request for Startups, Spring 2026

Recent Funding

🚀

WithCoverage

700+ clients in 18 months • Growth startups (GoPuff, Bombas, EightSleep)

$42M Series B

Sequoia, Khosla, 8VC

🎯

Harper

5,000+ clients • Middle America SMBs (daycares, dealerships, restaurants)

$47M Series A

YC W25 • Emergence, Peak XV

⚡

Panta

Hard-to-place E&S risks • Trucking, nightclubs, construction

YC W26

In production Dec 2024

Pattern

All three prove the same thesis: AI can handle the 95% coordination layer, freeing brokers for the 5% that actually requires judgment.

03

Brokers Are Drowning in Coordination

The industry runs on copy-paste, portal juggling, and endless email chasing.

67%

Access 5+ carrier portals weekly

Glia 2024 Agent Report

76%

Can't find/access markets to quote

IVANS 2025

2+ hrs

Average underwriter response time

Glia 2024

79%

In carrier portal when needing help

Glia 2024

The 50-Step Process

📝

Intake & Forms

ACORD forms, questionnaires, documentation

📤

Submission to Carriers

Submit to 5-20 carriers per risk

🔄

Chase Underwriters

Follow-up emails, answer questions

📊

Compare & Present

Quote comparison, gap analysis

✅

Bind & Service

COIs, endorsements, renewals

Why It's Ripe for AI

🤖

Structured & Repetitive

Same 50 steps, every single time

📋

Document-Heavy

PDFs, forms, emails — all parseable

🔗

API Access Exists

1,000+ carrier APIs via IVANS

💬

Communication-Centric

Email/chat = agent-native

⏰

Time-Sensitive

Speed = competitive advantage

04

Managed AI Ops for Insurance Brokers

We don't sell software. We deliver outcomes: placements, renewals, COIs — done.

🎯

Broker Goal

"Place 40 accounts/month"

→

⚡
Our Agent Fleet
Submissions, follow-up, quotes

→

🤖

Autopilot Execution

50 steps automated

→

✅

Bound Policies

10x capacity, same staff

AI SDR Tools (Clay, Apollo, etc.)

🛠️

You configure and manage

Become pseudo-IT for AI

📉

70% churn in 3 months

Tools don't deliver outcomes

❓

No domain expertise

Generic tools, generic results

OpenHolly Managed Ops

✅

We run the agents

You focus on advising clients

🎯

Outcome-based pricing

Pay per placement, not per seat

🏢

Insurance-native playbooks

ACORD forms, carrier APIs, COIs

The Panta Insight

One broker: 400 clients. One AI-augmented broker: 4,000 clients.

99% placement rate vs industry ~60%. Quote turnaround: days → hours.

05

GTM: Start with Mid-Market Agencies

35,000 agencies with <$2M revenue have no AI tools. They can't afford Zywave. We're their answer.

🎯 Target Segment

Mid-market agencies ($1-10M revenue)

Sophisticated enough to understand value
Small enough to decide quickly
Large enough to pay real money
Hungry for competitive edge

🚪 Entry Points

Free hook Insurance portfolio analysis
Partner Aggregators (Smart Choice, Keystone)
Direct State association chapters (Big "I")
Referral Accountants serving SMBs

💰 Unit Economics

$5-50K avg commercial premium
10-15% commission per policy
$500-5K revenue per placement
70%+ gross margin (vs 30% trad)

WithCoverage Playbook

"Thousands of calls, travel across dozens of states. Offer free insurance analysis showing overpayment. In 18 months, 700+ of the fastest growing companies switched to us." — Max Brenner, CEO

06

Why Now: Perfect Storm

🤖 AI Capability Inflection

Multimodal models read PDFs, emails, forms
Agent frameworks enable 50-step automation
1,000+ carrier APIs finally available (IVANS)

📉 Fragmented Market Ready

39K agencies → 38K (consolidating)
1/3 expect ownership changes in 5 years
Most run on "email and spreadsheets"

🐢 Incumbents Are Slow

Zywave just launched AI (Dec 2025)
Applied Systems: "augmentation not replacement"
PE-owned = margin focus, not innovation

✅ Proven Demand

WithCoverage: $42M, 700+ clients
Harper: $47M, 5,000+ clients
Panta: YC W26, in production

The Opportunity

$140-200B

Broker TAM (Sequoia)

39K

Agencies Underserved

95%

Work That's Automatable

The playbook is proven. The market is massive. The incumbents are asleep.

01

Vertical: IT Managed Services (MSP)

🖥️ OpenHolly for MSPs

AI IT Support That Resolves Tickets,
Not Just Routes Them

We automate 50% of IT helpdesk tickets in week one. Incumbents are hated. 67% of orgs can't hire enough techs. The market is ready.

$100B+ TAM 90% L1 Automatable ConnectWise/Kaseya Hated

"Faster to automate a task forever than to do it manually once."

— Serval's Core Pitch ($127M raised, $1B valuation)

02

The Problem: Chronic Technician Shortage

MSPs can't hire enough techs. Their tools are outdated. AI can finally solve L1.

67%

Organizations understaffed on IT/security

52%

MSPs say hiring is primary struggle

3.5M

Global cybersecurity workforce gap

90%

L1 tickets ServiceNow resolves autonomously (internal)

The Insight

Can't hire fast enough → must automate. L1 is now automatable. The gap is who does it for SMBs.

03

Incumbents Are Hated

ConnectWise and Kaseya have created massive dissatisfaction. MSPs are actively seeking alternatives.

🔴 ConnectWise

"Archaic UX" — looks like 2010 software
"Little to zero updates for 10+ years"
"Reporting is essentially garbage"
SSO security debacle damaged trust
Market share: 26.8% → 24.3% (losing)

🔴 Kaseya / Datto

"Chronically overcharge clients for thousands of dollars per year"
Billing errors are systemic, not one-off
CEO minimizes problems, lies about fixes
Acquired good companies → quality dropped
"Dollar store ConnectWise"

"ConnectWise is less bad overall"

— r/msp (damning with faint praise)

Structural issues: PE ownership → cost-cutting, poor support, billing games, technical debt. They can't rebuild AI-native. We can.

04

$100B+ Market, Nobody Owns It

$100B+

Global MSP TAM

50%+

Tickets automatable (proven by Serval)

$1B

Serval valuation (enterprise only)

Where We Play

Segment	Status
Enterprise ($100K+ ACV)	Serval owns this
SMB Direct (50-500 employees)	Blue ocean
Small MSPs (5-20 techs)	Good entry wedge

Why Now

🤖

L1 is now automatable

GPT-4+ enables reliable action execution

📉

Incumbent lock-in weakening

ConnectWise/Kaseya losing share, MSPs looking

🏗️

AI-native architectures possible

Code-gen enables custom workflows

05

The Playbook: Start AI-Native, Own the SMB

Serval proved it with enterprise. We do it for the rest of the market.

🚀

Phase 1

Small MSPs (5-20 techs)

→

💰

Phase 2

SMBs direct ($30-80/employee)

→

🎯
Endgame
IT-as-a-service for SMBs

What We Automate (Week 1)

🔑

Password resets (15-25% of tickets)

90%+ automatable

🔐

Access provisioning (15-20%)

Okta, Google Groups, SCIM

📦

Software install/config (10-15%)

Standard apps, self-service

👋

Onboarding/offboarding (10-15%)

Day 1 automation

Competitive Advantage

Us	Them
AI-native architecture	Bolt-on AI to legacy
Code-based workflows (auditable)	Black-box AI
Outcome pricing (per ticket)	Seat-based (pay even if unused)
Deploy in days	6-month implementations

06

The Ask

🎯 Target Design Partners

Small MSPs (5-20 technicians)
Pain with ConnectWise/Kaseya billing
Drowning in L1 tickets
Open to AI (not defensive)

Distribution: r/msp, IT Nation, MSP peer groups

💰 Pricing Model

Entry Per ticket resolved — prove ROI fast
Scale Per technician — $149-209/tech/mo
SMB Direct Per employee — $30-80/mo

Start free, pay when we hit 30% automation.

The Thesis

Every SMB outsources IT to MSPs. MSPs are dissatisfied with ConnectWise/Kaseya. 50%+ of tickets are automatable.
Nobody is selling "your IT just runs" directly to SMBs as an outcome.

Serval raised $127M doing this for enterprise. The SMB market is unowned.

"The IT team that scales with you, without headcount."

01

Vertical • Healthcare RCM

🏥 OpenHolly

Autopilot for Medical Billing

We recover millions in denied claims. AI agents that fight back against payer denials — so hospitals can focus on patients, not paperwork.

Healthcare RCM $50-80B TAM 11.8% Denial Rate Payers Using AI to Deny

One insurer allegedly denied 300,000 claims in under two months using AI. Providers need their own AI to fight back.

— Healthcare Industry Report, 2024

02

The Problem: Payers Are Winning

Hospitals are drowning in denials. Payers deploy AI to reject claims faster. Providers still fight with spreadsheets.

11.8%

Initial claim denial rate (up from 10.2%)

2024 Industry Data

$25-181

Cost to rework each denied claim

HFMA

60%

Providers can't hire RCM staff

Becker's Healthcare

300K

Claims denied by one payer AI in 2 months

Industry Report

⚠️ The Arms Race

Payers are deploying AI to deny claims faster. Medicare Advantage denials up 4.8% YoY. Providers need AI to fight back — or they lose.

03

Why Now: $50-80B Opportunity

Market Size

💰

$50-80B outsourced RCM

Massive existing spend, ripe for automation

📈

Growing denial complexity

89% saw PA requirements increase in 2024

👴

40%+ coders retiring

Massive labor shortage, no backfill coming

Why AI Wins Now

🧠

LLMs can read 50K-word records

Finally capable of clinical document understanding

📋

150K+ ICD-10 codes now tractable

AI accuracy approaching human coders

⚖️

CMS mandating electronic PA

Regulatory tailwind forcing digitization

The Insight

Outcome-based pricing is native to healthcare — providers already pay % of collections. We align incentives: we only win when they recover money.

04

Competitive Landscape

Legacy vendors are slow. AI-native players are enterprise-only. The mid-market is wide open.

Company	Focus	Strengths	Gap
Anterior ($64M+)	Prior auth (payer-side)	99.24% accuracy, KLAS validated	Payer-only, not provider-side
AKASA	Provider RCM (enterprise)	Cleveland Clinic, Stanford	12+ month sales cycles, expensive
Fathom	Medical coding only	95.5/100 KLAS score	Narrow focus, no denial mgmt
Waystar / R1	Legacy platforms	Scale, integrations	"AI" is mostly marketing, slow
OpenHolly	Denial recovery	Outcome-based, fast deploy	—

Our Wedge: Denial Management

Start with the most measurable outcome: dollars recovered from denials.

15-20% of recovered revenue. We only win when you recover money.

05

How It Works: Agentic Denial Recovery

AI agents that read charts, identify denial root causes, generate appeals, and submit — automatically.

📥

Denial Arrives

Payer rejects claim

→

🤖
AI Reads Chart
50K words in 8 seconds

→

📝
Appeal Generated
Clinical evidence cited

→

✅

$ Recovered

You keep 80-85%

🔍 Root Cause Analysis

AI identifies why denial happened — missing documentation, coding error, or arbitrary payer rule.

📋 Evidence Extraction

Highlights relevant clinical passages from 50K-word records in seconds.

⚡ Auto-Appeal

Generates payer-specific appeal letters with guideline citations. Human reviews in 5 minutes.

06

Go-to-Market & The Ask

Entry Strategy

🎯

Start: Mid-size specialty practices

Orthopedics, cardiology, oncology (high-value procedures)

🔑

Wedge: Denial recovery audit

Low risk entry — AI is checking, not deciding

📈

Expand: Full denial management

Then coding QA, prior auth automation

Pricing Model

💰 Outcome-Based

15-20%

of recovered revenue from denials

You only pay when we recover money. Zero risk.

Looking For: Design Partners

3-5 specialty practices with 500+ denials/month

We'll recover $100K+ in year one — or you pay nothing.

Outcome-Based Zero Risk Trial HIPAA Compliant

01

Vertical • Accounting & Audit

📊 OpenHolly

AI That Closes the Books
While Accountants Sleep

340K accountants left the profession. 75% of CPAs are retiring. The close still takes 6+ days. We're building the autopilot.

$50-80B TAM 94% Still Use Excel Basis: $1.15B Validation

"Accounting is structured, high-stakes, and essential to every business on earth. It's also one of the most underbuilt areas in technology."

— Basis founders (valued at $1.15B)

02

The Perfect Storm

A profession in crisis meets primitive tooling. Something has to give.

340K

Accountants left since 2020

Bureau of Labor Statistics

75%

CPAs nearing retirement

AICPA/NASBA 2025

-30%

CPA exam candidates since 2016

Industry data

120K+

Open jobs per year

Ramp 2026

The Tooling Problem

📊

94% still use Excel for close

50% cite it as key reason close is slow

⏱️

50% take 6+ business days to close

Only 18% close in 3 days or less

💸

20-50 hrs/month on cash reconciliation

#1 bottleneck — 3-5 systems just to match

Bench Collapse Proves the Thesis

"Bench raised ~$160M. Shut down Dec 2024. Human-heavy bookkeeping models can't scale profitably."

— Industry lesson

🔥 "Bench Refugees" = Urgent Demand

Thousands of abandoned customers actively looking for alternatives. Distrust human-heavy models. Ready for AI-first.

03

Market Validation: $1.15B Proves the Opportunity

Basis raised $100M Series B at $1.15B valuation in Feb 2026. The market is real.

🦄 Basis ($1.15B)

30% of Top 25 US firms already using

20% of Top 150 firms

First AI agent to complete end-to-end 1065 tax return autonomously

$100M Series B, Feb 2026

📈 Rillet (a16z + ICONIQ)

200+ customers, Series B

AI-native ERP — not AI bolted onto legacy

"Go live in weeks" vs months

Doubled ARR quarter-over-quarter

🔍 Truewind ($17M)

Top 50 accounting firm partners

100+ customers

"Absorbs 47% of month-end close tasks"

Series A, Dec 2024

The TAM

$50-80B market for accounting automation

Every business needs accounting. It's recurring. It's essential. And it's still done by hand.

04

The Autonomous Close Agent

AI that runs overnight, completes 90% of month-end tasks, and generates an exception report for morning review.

🌙

6 PM

Run Autonomous Close

→

🤖
Agent Fleet
Works through 30+ tasks

→

☀️

8 AM

Exception report ready

→

✅

Review & Sign

5-10 items vs 100+

What the Agent Does

🏦

Bank Reconciliation

Auto-matching across all sources (95% accuracy)

📑

Transaction Classification

90%+ accuracy with LLMs, learns from corrections

📊

Accrual Workpapers

Auto-generated with supporting documentation

📈

Variance Analysis

AI explains anomalies, humans review

The Outcome

Metric	Before	After
Close Time	6+ days	<3 days
Cash Rec Hours	20-50 hrs	<5 hrs
Manual Tasks	80%	10%
Error Rate	1-5%	<0.5%

Your Accountant Reviews the Work Instead of Doing the Work

Everything is auditable. Everything is documented. AI generates, humans verify.

05

Why Now

🧠 AI Capability Inflection

LLMs now achieve >90% accuracy on transaction classification
Basis demonstrated first autonomous 1065 tax return (Feb 2026)
Multi-hour autonomous agent workflows proven possible

📉 Structural Talent Crisis

340K+ accountants left since 2020
75% of CPAs nearing retirement
Projected enrollment decline of 15% (2025-2029)

Why Incumbents Can't Catch Up

🏚️

NetSuite is 25 years old

QuickBooks architecture is rigid

🤖

Adding chatbots, not agents

Rillet raised $50M+ because "rebuilding from scratch" is the only way

💼

Big Four won't build this

Services business, not software. Internal innovation killed by billable hour model.

Our Entry Point

🔥

Bench Refugees

Urgent need, distrust of human-heavy, ready for AI-first

📈

Mid-Market Outgrowers

$10-100M revenue, 2-5 person finance teams drowning in close

🏢

Small CPA Firms (100-500 clients)

Fast adoption, price sensitive, willing to try

06

The Opportunity

$50-80B

Total Addressable Market

$1.15B

Basis valuation (market validation)

340K

Accountants gone = demand for AI

Pricing Model

Per-Entity Most common for accounting software

Volume Tiers Common for AP/AR automation

Outcome-Based Emerging: % of cost savings

$2K – $15K/mo

Based on entity count + transaction volume

ICP: Design Partners

🏢

$10-100M revenue

Outgrowing QuickBooks, avoiding NetSuite

👥

2-5 person finance team

Drowning in close process

🏗️

Multi-entity structures

SaaS, e-commerce, hospitality

AI That Closes the Books While Accountants Sleep

The talent crisis is permanent. The tooling is primitive. The AI is ready.

We're building the autopilot for accounting.

01

Vertical Pitch • Claims Adjusting

🔥 OpenHolly

Autopilot for Insurance Claims

We sell the adjustment, not the software. 400K workers retiring. $50-80B market. AI-native TPAs are the future.

$50-80B TAM $730M Exit 400K Retiring Sequoia Top Pick

Services: The New Software. The biggest opportunity isn't selling tools to adjusters — it's replacing what adjusters do.

— Sequoia Capital, March 2026

02

The Perfect Storm

Structural workforce crisis meets broken incumbent tech.

400K

Insurance workers retiring by 2026

25% of workforce is 55+

70%

AI SDR users churn in 3 months

Industry data

18-24

Months to implement Guidewire

Industry standard

$730M

EvolutionIQ acquired (Dec 2024)

CCC acquisition

"Half a billion between software, personnel, and opportunity cost" for Guidewire implementations that still fail.

— Industry Analysis

The Reality

Within 15 years, a large portion of today's adjusters will have retired — and there won't be enough people to replace them.

03

Why Now: Market Validation

Smart money is flooding in. Exits are happening. The window is open.

💰

EvolutionIQ → CCC: $730M

December 2024. AI claims automation is now a proven exit category.

Company	Model	Funding	Traction
Strala	AI-native TPA	Founders Fund (13x oversubscribed)	26 US clients, UK expansion
Pace	Operations automation	Sequoia $10M Series A	Prudential multi-year deal
Elysian	Complex commercial claims	AmFam Ventures $6M	State Farm pitch winner, Lloyd's Lab
Tractable	Photo AI (point solution)	$1B+ unicorn	Auto insurers, property

The Gap

Tractable sells photo AI. Shift sells fraud detection. Nobody sells the full claim outcome — FNOL to settlement, end-to-end.

04

Entry Strategy: TPAs & MGAs

Start where decisions are fast and budgets already exist.

Why TPAs/MGAs First

⚡

Already outsourced

Budget line exists. Vendor swap, not new category.

🏃

Faster decisions

Not 12-24 month enterprise sales cycles

💰

Undercut by 30-50%

Per-claim pricing below legacy TPA rates

📈

Expand upward

Use TPA wins to land large carriers

What We Deliver

📞 FNOL Intake Agent

↓

📋 Coverage Verification

↓

🔍 Investigation + Fraud

↓

💵 Estimate + Settlement

↓

✅ Claim Closed

The Strala Playbook

Start with FNOL/triage → Hybrid deployments → Full TPA as trust builds

"The answer can't always be more people." — Strala

05

The Numbers That Matter

Outcome-based pricing. Clear ROI story.

⏱️ Cycle Time

Industry avg 30+ days

Top performers 10 days

AI Target 3-5 days

💰 Cost per Claim

Legacy TPA $250-1,500

Our pricing 30-50% less

+ Outcome bonus Per day saved

📊 Claims Volume

Human adjuster 50-100/month

AI agent fleet 500+/month

Loss ratio impact -1-2pp

Pricing Model

Per Claim 30-50% below legacy TPA rates
Outcome Bonus +$X per day under cycle target
Pilot Free/discounted on subset → expand

Loss Ratio Impact

Strala claims 1 point loss ratio improvement. That's the number carriers care about most.

— Industry benchmark

06

The Opportunity

🎯 Why This, Why Now

$50-80B TAM in claims adjusting spend
400K workers retiring, pipeline empty
$730M exit validates category
Sequoia + Founders Fund backing competitors
Guidewire "half billion" disasters create opening

🚀 Our Edge

Fat startup model — outcomes, not tools
Playbooks compound — each customer teaches us
Per-claim pricing — undercut legacy by 30-50%
TPA wedge — fast sales, expand upward

"Autopilot for insurance claims — we sell the adjustment, not the software."

The workforce is retiring. We're what comes next.

$50-80B Market Validated by $730M Exit Perfect Timing

01

TAX ADVISORY

📋 Tax Advisory Autopilot

The CPA Extinction Event

75% of CPAs are reaching retirement. 340,000 accountants left the profession since 2020. But tax work is 80-90% pure intelligence work—the exact work AI agents do best. We're building autopilot for tax preparation.

$30-35B

US Tax Preparation TAM

80-90%

Intelligence Work (Automatable)

340K

CPAs Left Profession (2020-22)

75%

CPAs Nearing Retirement

02

PROBLEM

Tax Season Is Breaking People

"Endless hours, stressed teams, client overload, constant risk of missing deadlines." 42% of firms report retention issues from burnout. The people who do stay work 60-80 hour weeks for months.

42%

Firms Report Retention Issues

60-80

Hours/Week During Tax Season

61.5%

Say Price Is #1 Complaint

"Difficulties with state returns came up repeatedly in 'dislike' responses. Multi-state complexity multiplies fast, and manual tracking of different state rules becomes impossible at scale."

03

INSIGHT

Multi-State Is the Unsolved Problem

Incumbents charge $60K+/year for seat licenses. Blue J achieved 12x revenue growth via CPA.com distribution. But nobody has cracked multi-state complexity—nexus determination, varying apportionment rules, threshold tracking.

Blue J

12x Revenue via CPA.com Partnership

$60K+

UltraTax Annual Cost

50

States, Each With Different Rules

The Killer Gap: Research tools sell per-seat. Preparation is still manual. Nobody sells completed returns. The outcome-based pricing model is wide open.

04

SOLUTION

AI Agents That Prepare Returns

Multi-agent system: reads documents, applies firm's tax strategy, enters data into systems. What takes 4 hours becomes 15 minutes of review. Every citation verifiable. Human signs, AI does the work.

Document Agent

Extracts from K-1s, statements, invoices

Research Agent

IRC citations, state rules, confidence scores

Prep Agent

Drafts returns, flags items for review

SALT Agent

Multi-state nexus, apportionment, thresholds

Why Now: GPT-4 enabled Blue J's 12x growth. Filed claims 30-50% review cycle reduction. Avalara building "agentic tax" for transaction compliance. The capability inflection is here.

05

GTM

Start Mid-Market, Pay Per Return

6-50 preparer firms: fast decisions, acute talent pain, can't build in-house. Outcome-based pricing—firms pay for completed returns, not software seats. CPA society partnerships for distribution.

6-50

Preparer Sweet Spot

Per-Return

Pricing (vs $1,500/Seat)

400K+

AICPA Members (Distribution)

Weeks

Not Months to Close

Expansion Path: Mid-market firms → State CPA society endorsements → Enterprise (Top 100) → Big Four white-label. Basis already has 30% of Top 25 with enterprise-first approach.

06

ASK

Autopilot for Tax Preparation

Blue J sells research tools to accountants. We sell completed tax returns to firms. Outcome-based pricing aligned with Sequoia's "sell the work" thesis. The demographic crisis is now—we're the solution.

70%

Prep Time Reduction Target

7d → 1d

Turnaround (Filed Benchmark)

30-50%

Faster Review Cycles

12x

Blue J Revenue Growth (Comp)

One-Liner: "AI agents that prepare tax returns from scratch—firms pay per return, not per seat. We automate the 80% of tax work that's pure intelligence, so the retiring 75% of CPAs don't take the industry with them."

01

LEGAL TRANSACTIONAL

⚖️ Legal Autopilot

Your Legal Team Is Buried in NDAs.
We Handle Them Autonomously.

The autonomous legal team for scaling companies — starting with NDAs.

$20-25B

Legal AI TAM

$11B

Harvey Valuation

12+ mo

Avg CLM Implementation Time

28%

Legal Teams Want Contract Review AI First

02

The Problem

"For something as simple as correcting a typo, you have to download the template, correct it in Word, and re-upload. Initial setup was a nightmare — took nearly a year while we paid for the service."

— G2 Reviews on CLM Platforms

💸

Harvey: $1,200/seat/month
20-seat minimum, 12-month commits. Built for BigLaw, not scaling companies.

⏳

CLM implementations take 12+ months
"Implementation partner was required and it took a year longer than promised"

🔍

Search is broken
"Can't search by PO# or SKU#" — business-relevant metadata ignored

💼

Crosby: ~$400/contract
Great for funded startups, but too expensive for volume

🐌

3-5 day turnarounds on simple NDAs
Legal bottlenecks literally slow down expansion and kill deals

📄

Word dependency everywhere
Clunky downloads, re-uploads, version control nightmares

03

The Opportunity

Why NDAs? Highest volume contract type. Most standardized. Lowest risk of catastrophic error. Already outsourced by many companies ($13B+ LPO market). Sales teams hate waiting for legal. It's the perfect wedge.

Market Size

📊

$20-25B TAM — Legal AI market

🏭

$13B+ LPO market — Already outsourced, ripe for automation

📈

40% of enterprise apps — Will have AI agents by 2026 (Gartner)

Competitive Landscape

🦄

Harvey ($11B) — BigLaw only, $1,200/seat

🎯

Crosby — Slack-native, ~$400/contract

🏢

Ironclad/DocuSign CLM — Heavy implementation, enterprise only

The Gap: No one owns autonomous contract processing for the 90% of companies that can't afford $400/contract or 12-month CLM implementations. Harvey's GTM? "Provocative demos attacking lawyers' own work." Ours? Ship in 30 minutes, not 30 days.

04

The Solution

One AI agent that handles contracts from intake to signed — Slack-native, no lawyers needed for routine work.

1️⃣

INTAKE (30 seconds)
Sales drops NDA in Slack → AI acknowledges and starts processing

2️⃣

ANALYZE (Autonomous)
AI identifies deviations from company playbook, flags risk levels

3️⃣

REDLINE (Autonomous)
Auto-generates redlines with clear reasoning, ready to send back

4️⃣

ESCALATE (Smart)
Only flags items that truly need human review — <15% of contracts

5️⃣

NEGOTIATE (Semi-Autonomous)
Handles counter-party responses, multiple rounds without human input

6️⃣

TRACK (Autonomous)
Monitors obligations, alerts before renewals, auto-drafts compliance comms

"This used to take 3 days. Now it's 47 minutes. They've processed 200 NDAs this quarter, attorney touched 12."

05

GTM & Traction

Phase 1: NDA Wedge

🎯

Target: Series A-C startups with aggressive GTM (100-500 employees)

😤

Pain: "Legal bottlenecks literally slowing down expansion"

🤝

Distribution: VC portfolio network (Crosby playbook)

Pricing Model

📋

$50/contract — 8x cheaper than Crosby

✅

$2K/mo flat — Unlimited NDAs for high-volume teams

Target Metrics (Month 12)

25K

Contracts Processed

<30 min

Turnaround Time

<15%

Attorney Review Rate

200%

Expansion Rate

Traction Narrative: "5 companies. 1,000+ contracts processed. Turnaround time: 47 minutes (baseline was 3 days). Cost per contract: $50 vs $400 at Crosby. NPS: 78. $150K ARR run rate."

06

The Ask

$2M Seed

Own the autonomous legal stack for scaling companies

Use of Funds

👥

Hiring: 3 engineers, 1 legal ops

🔧

Product: MSA support, playbook builder, AI negotiation

📈

GTM: VC partnerships, legal ops communities

12-Month Milestones

🎯

100 company customers

💵

$1.5M ARR

📊

50,000+ contracts processed

🏆

Expand to MSAs, DPAs, vendor agreements

"Crosby for the 90% of companies that can't afford $400/contract. We're the autonomous legal team for scaling companies — starting with NDAs."

Dependabot for AI Models

The Problem

Competitive Landscape

How It Works

1. Connect

2. Watch

3. PR

The Dependabot Pattern

ICP & Pricing

🎯 Target Customer

💰 Pricing

Why Now?

CI/CD for AI

🔴 AI Today

🟢 With Us

The Insight

The Real Gap

What We Build

GitHub Action + CI Integration

Aemon vs Us

ICP & Pricing

🎯 Target Customer

💰 Pricing

Private LMArena

The Problem with Public Benchmarks

What We Build

Enterprise Model Intelligence Platform

Aemon vs Us

ICP & Pricing

🎯 Target Customer

💰 Pricing

AI Model FinOps

The Gap

What We Build

FinOps + Quality Optimization Layer

ICP & Pricing

🎯 Target Customer

💰 Pricing

Eval-as-a-Service

The Insight

The Bottleneck Isn't Optimization

What We Build

Eval Generation Agent

Aemon vs Us

ICP & Pricing

🎯 Target Customer

💰 Pricing

AI-Powered Outcomes.Not Tools. Not Reports.

The Shift

What's Easy Now

One-click agent deployment

Capable models

Economics work

What's Still Hard

People become pseudo-IT

Debugging eats time

No one wants to manage agents

How We Got Here

The Variety We've Delivered

SDR for construction companies

Video generation for ML training

Research assets for universities

BDR for startups

The Common Thread

The Market Reality

The Model: Managed AI Operations

DIY / SaaS Tools

You manage the agents

Weeks to figure out

Hope it works

OpenHolly (Us)

We manage the agents

Results in days

Outcomes guaranteed

Current Focus: GTM/Sales

Why Sales First

Clear success metric

Broken market

High willingness to pay

We have traction

AI-Powered Outcomes.
Not Tools. Not Reports.