R&D Engineer • Dependabot for AI
01

Dependabot for AI Models

Every week a new model drops. Your team manually benchmarks it. What if that happened automatically — with PRs when something's better?

52+

New models released per month

2-4 wks

Time to benchmark each one

$0

Tools that auto-PR improvements

02

The Problem

  • New model every week from OpenAI, Anthropic, Google, Meta, Mistral, open-source
  • Teams manually benchmark against their stack — takes days per model
  • By the time you finish testing, three more models dropped
  • No one knows if they're running the best model for their use case
03

Competitive Landscape

Company What They Do Gap
Portkey AI gateway, routing, 1600+ LLMs No auto-benchmarking against YOUR stack
Unify ($8M) Finds best LLM for the job Router-first, not benchmark-first
Braintrust ($36M, $150M val) Eval-driven development Reactive, not proactive
Us Watch → Auto-benchmark → PR when better
04

How It Works

1. Connect

Connect your AI stack. Define your eval suite (or we help you build one).

2. Watch

We monitor every model release across all providers. Automatically.

3. PR

When something beats your current setup, you get a PR with benchmarks.

The Dependabot Pattern

Watch → Auto-benchmark → PR when better

Nobody does this for AI models. We do.

05

ICP & Pricing

🎯 Target Customer

  • Any team with AI in production
  • 3+ AI features deployed
  • Series B+ (or well-funded Series A)
  • Engineering-led decision making

💰 Pricing

  • Starter $2K/mo — up to 5 endpoints
  • Growth $8K/mo — up to 20 endpoints
  • Enterprise $20K+/mo — unlimited
06

Why Now?

  • Model release velocity is accelerating — impossible to keep up manually
  • LMArena proved model evaluation is a $1.7B market (raised $150M)
  • Braintrust proved enterprises pay for evals ($36M Series A)
  • Nobody has combined continuous monitoring + proactive optimization
R&D Engineer • CI/CD for AI
01

CI/CD for AI

Software engineering solved "did my change break things?" 20 years ago. AI engineering still ships blind.

🔴 AI Today

Push prompt change → Hope it works → Find out in production

🟢 With Us

Push prompt change → Eval runs → PR blocked if quality drops

02

The Insight

The gap isn't that people don't have evals — Braintrust, Humanloop, and DSPy are giving them that.

The Real Gap

Evals aren't integrated as blocking gates in deployment pipelines the way unit tests are.

03

What We Build

GitHub Action + CI Integration

  • Automatically runs your eval suite against every PR that touches AI code
  • Prompts, model configs, RAG pipelines — all covered
  • If eval score drops → PR is blocked
  • If new model improves score → PR is auto-generated

Think: Braintrust's eval engine + Dependabot's automation + GitHub Actions' CI/CD — fused into one opinionated product.

04

Aemon vs Us

Dimension Aemon Us
Purpose Discover new optimal solutions Protect existing quality + incrementally improve
Posture Offensive R&D Defensive Ops
Buyer R&D Lead / ML Researcher Engineering Manager / Platform Team
Integration Standalone tool Lives in your CI/CD
05

ICP & Pricing

🎯 Target Customer

  • 3+ AI features in production
  • Series B+ companies
  • Engineering-led sale
  • Already using GitHub/GitLab CI

💰 Pricing

$2K – $20K/mo

Based on eval runs & endpoints

R&D Engineer • Private LMArena
01

Private LMArena

LMArena raised $150M at $1.7B valuation on public evals. Enterprises need private evals on their own data.

$1.7B

LMArena valuation (public evals)

???

Private enterprise eval market

02

The Problem with Public Benchmarks

  • Companies have been caught gaming LMArena scores
  • Public benchmarks don't reflect YOUR use cases
  • Generic evals ≠ production performance for YOUR data
  • Enterprises need proprietary intelligence
03

What We Build

Enterprise Model Intelligence Platform

  • Define eval suites from your production data
  • Continuously benchmark every new model release
  • Test every prompt variation, RAG config automatically
  • Output: Private leaderboard + recommended actions

Hugging Face's Yourbench is the open-source precursor — but it's a DIY tool requiring significant ML expertise. We productize it.

04

Aemon vs Us

Aemon Private LMArena
Evolves novel algorithms Evaluates existing models/configs
Research Intelligence
05

ICP & Pricing

🎯 Target Customer

  • 10+ AI features in production
  • $50K+/mo on AI infrastructure
  • VP of Engineering or Head of AI
  • Fintech, ad-tech, e-commerce, healthtech

💰 Pricing

$10K – $100K/mo

Enterprise contracts

R&D Engineer • AI Model FinOps
01

AI Model FinOps

Companies spend $85K+/mo on AI infrastructure. Nobody knows if they're overpaying for quality they don't need.

$85K

Avg monthly AI spend

36%

YoY growth

0

Visibility into cost-quality tradeoff

02

The Gap

Tool What It Does Missing
Portkey Routing, fallbacks No cost-quality optimization
Unify Cheapest model that meets threshold Not continuous, not production data
Us Continuously optimize cost-quality frontier across entire AI stack
03

What We Build

FinOps + Quality Optimization Layer

An agent that sits on top of your AI gateway:

  • Continuously profiles every AI call (model, cost, latency, quality)
  • Uses your production data as the eval
  • Generates actionable recommendations:
"Switch endpoint X from GPT-4o to Claude 3.5 Sonnet — saves $8K/mo, quality improves 2%"

"Your RAG pipeline on endpoint Y is underperforming — here's an optimized config"
04

ICP & Pricing

🎯 Target Customer

  • $20K+/mo on LLM APIs
  • CFO / VP Eng sale
  • Any industry with AI in production

💰 Pricing

$2K – $15K/mo

Pays for itself from savings

⚡ Easiest ROI story of all these ideas

R&D Engineer • Eval-as-a-Service
01

Eval-as-a-Service

Building good evals is harder than building the AI features themselves. We build the oracle.

02

The Insight

The Bottleneck Isn't Optimization

Braintrust's thesis: "If your eval is right, every decision becomes simple."

DSPy's framework depends on having good metrics to optimize against.

The bottleneck in the entire AI development loop is knowing what "good" looks like.

03

What We Build

Eval Generation Agent

  • Takes your production AI traces
  • Analyzes failure modes
  • Interviews domain experts (async, Slack-based)
  • Generates calibrated eval suites:

✓ Datasets

✓ Scoring rubrics

✓ Automated judges

Output plugs into Braintrust, DSPy, or your own CI/CD.

04

Aemon vs Us

Aemon Eval-as-a-Service
Assumes you have a good eval function Creates the eval function
Optimizer Oracle
Depends on eval quality Is the prerequisite to everything else

If you own the eval layer, you become the foundation every optimization tool depends on.

05

ICP & Pricing

🎯 Target Customer

  • Same as Braintrust's customers
  • AI product teams at Series B+
  • Earlier in journey — before they've figured out evals

💰 Pricing

$5K – $30K/mo

Per eval suite built + maintenance

01
LEAD PITCH: Fat Startup

AI-Powered Outcomes.
Not Tools. Not Reports.

We operate fleets of AI agents that deliver results. Customers get outcomes. We get playbooks. Playbooks become platform.

Fat Startup $4K MRR 5 Customers Playbooks Compounding

A fat startup ships outcomes, not features. It bundles software, data, and human ops into one integrated product that actually gets the job done.

— Andrew Lee, a16z Speedrun Partner

02

The Shift

Spinning up AI agents is now trivial. Managing them is the new bottleneck.

What's Easy Now

🤖

One-click agent deployment

OpenClaw, dockerized instances, cloud GPUs

🔀

Capable models

GPT-5, Claude 4, open-source alternatives

💰

Economics work

$0.01-0.10 per task, not $50/hr

What's Still Hard

⚠️

People become pseudo-IT

Babysitting agents instead of running business

⚠️

Debugging eats time

Every hour on agent issues ≠ hour on actual work

⚠️

No one wants to manage agents

They want outcomes, not infrastructure

The Insight

Founders are too busy to become AI ops engineers. We absorb that complexity so they can focus on their actual business.

03

How We Got Here

We started in sales. Then customers kept asking for more.

📧
Started: Sales
SDR automation
🎬
Then: Video
ML training data gen
🔬
Then: Research
University lab assets
💡
Pattern
We manage, they get results

The Variety We've Delivered

🏗️

SDR for construction companies

Lead gen + qualification

🎬

Video generation for ML training

Synthetic data pipelines

🔬

Research assets for universities

Literature review + synthesis

🚀

BDR for startups

Outbound + meeting booking

The Common Thread

Every customer had the same problem:

"I tried spinning up agents myself. Then I spent all my time debugging them instead of running my business."

— Pattern across customers

They didn't want to manage AI. They wanted outcomes.

04

The Market Reality

95%
AI projects fail before production
MIT Project NANDA
70%
AI SDR users churn in 3 months
Industry data
$47K
Lost from one agent runaway
TechStartups
171%
ROI when deployment succeeds
MIT NANDA

Why Tools Aren't Enough

Companies don't want to become AI operations experts. They want someone to absorb the complexity and just deliver results.

05

The Model: Managed AI Operations

We operate agent fleets. Customers get outcomes. We encode playbooks.

🎯
Customer Goal
"50 qualified meetings/month"
Our Engineers
Configure agent fleet
🤖
Agent Fleet
Research, outreach, qualify
Outcome
Meetings on calendar

DIY / SaaS Tools

🛠️

You manage the agents

Become pseudo-IT for AI

🐢

Weeks to figure out

Setup, config, debugging

Hope it works

No guarantee of outcomes

OpenHolly (Us)

We manage the agents

You focus on your business

Results in days

We've done this before (playbooks)

🎯

Outcomes guaranteed

Pay for results, not effort

06

Current Focus: GTM/Sales

Starting with sales because the outcome is measurable: meetings booked.

Why Sales First

📊

Clear success metric

Meetings booked = revenue

💔

Broken market

70% AI SDR churn = customers looking for alternatives

💰

High willingness to pay

$5-10K/month for what works

We have traction

50% of our revenue is SDR/BDR

What We Deliver

🔍

Research Agent

Deep prospect intelligence

✍️

Outreach Agent

Personalized messaging

📋

Qualification Agent

Score and prioritize leads

📅

Scheduling Agent

Book the meeting

Expansion Path

Sales → Research/Intel → Operations → Content. Each vertical = new playbook, same infrastructure.

07

The Unlock: Playbooks Compound

Every engagement encodes a playbook. Playbooks make the next engagement faster. This is how we build the moat.

🛠️
Year 1: Agency
Do the work, learn playbooks
📚
Year 2: Productize
Playbooks become templates
🏗️
Year 3: Platform
Others build on our templates

What's In A Playbook

Every engagement becomes encoded knowledge:

📝

Workflow sequences

What steps work for each use case

🎯

Prompt templates

Messaging that actually converts

⚙️

Agent configurations

Which models, tools, and sequences

🚫

Failure patterns

What breaks and how to prevent it

The Compounding Effect

1️⃣

Customer 1: 2 weeks

Figure everything out from scratch

5️⃣

Customer 5: 3 days

Apply existing playbook + customize

🔟

Customer 10: Hours

Playbook is battle-tested

🏗️

Eventually: Self-serve

Playbooks become product

The Fat Startup Advantage

We're getting paid to build our moat. Every dollar of revenue = more encoded knowledge. Competitors starting later start from zero.

08

Technical Insight

We're productizing the research consensus on what actually works.

The Research Convergence

📄

Workflow-First Architecture

Declarative orchestration beats autonomous agents (Microsoft, 2024-25 surveys)

👤

HITL as Training Signal

Human edits train intervention policies (ReHAC, EMNLP 2024)

🎯

Playbooks as Optimization Surface

Prompts + tool-use are parameters to optimize (AVATAR, NeurIPS 2024)

🛡️

Guardrails are Required

Transparency + oversight for multi-agent systems (Nature, 2026)

Our Implementation

Declarative playbooks

Versioned configs, not imperative code

Logged human checkpoints

Every edit = structured training signal

Continuous optimization

Prompts, branching, model routing improve over time

Action-layer guardrails

Can't be prompt-injected, auditable

We log trajectories, human edits, and outcomes, then update prompts, branching logic, and model routing so the same business objective is achieved more reliably over time. The playbook is the learned policy space.

— Our technical thesis

09

The Compound Library

The internal system that makes agent workflows repeatable and efficient.

🔧
Verified Tools
Tested integrations
+
💬
Working Prompts
By use case + vertical
+
🧠
Model Routing
Which model where
+
🚫
Failure Patterns
What breaks + fixes
📦
New Client Workflow
Compose from proven components

Without This System

🔄

Reinvent every time

Which tools? Which prompts? Which models?

🐢

Slow iteration

Learn the same lessons repeatedly

📉

Linear scaling

More clients = more eng hours

With The Compound Library

Compose from proven

Verified, tested, reusable primitives

📈

Each engagement adds

Learnings feed back into system

🚀

Sublinear scaling

More clients = richer library = faster

The Compounding Effect

Workflow #1 takes a week. Workflow #10 takes a day. Workflow #100 takes hours. The library IS the moat.

10

Why Us

Team

Keith Schacht

Co-Founder + Advisor

$140M exit (Mystery) · a16z funded · Built consumer products used by millions

[CTO]

Co-Founder & CTO

CTO of Because · $3M Seed · Deep agent infrastructure experience

Yasir

Co-Founder

yapthis.com · Agentic architecture · Shipped production agent systems

Unfair Advantages

🐕

We're running on OpenClaw

Dog-fooding our own infrastructure daily

📊

We've built observability

ClawView for agent monitoring

🛡️

We've built guardrails

Agent Seatbelt for safety

💵

Revenue already

$4K MRR, +$2K this week

11

Traction

$4K
MRR
5
Customers
50%
SDR/BDR
+$2K
Added This Week

What This Proves

Companies will pay for AI-powered outcomes when someone else manages the complexity. The demand is real. The model works.

12

The Ask

What We Need

💰

$[X] Pre-Seed

Scale agent fleet + engineering team

🎯

12-month goal: $1M ARR

Prove the playbooks at scale

📚

Then: Productize

Turn proven playbooks into self-serve templates

Why Now

🚀

OpenClaw + GPT-5 + Claude 4

Agents just became capable enough

💔

AI SDR market burned

70% churn = customers looking for what works

First-mover on playbooks

Every month we operate = more encoded knowledge

OpenHolly: AI-Powered Outcomes

Customers get results. We get playbooks. Playbooks become platform.

01
V1: Personal AI OS

Your Personal AI OS

An AI that knows your context, anticipates your needs, and takes action on your behalf—not a chatbot you have to prompt.

Pre-Seed $4K MRR 5 Customers Always-On AI

The Vision

Imagine an AI that actually knows you—your work, your preferences, your patterns. It doesn't wait for commands. It proactively handles tasks, flags important things, and learns from every interaction.

02

The $56B Opportunity

Personal AI assistants are about to explode.

$16B
AI Assistant Market 2024
Grand View Research
$56B
Projected by 2034
Market.us (38% CAGR)
75%
Households with AI by 2025
Gartner Forecast
72%
US Teens Use AI Companions
2025 Study

AI personal agents will arrive soon. What we do now with apps—manually, and in piecemeal fashion—will be done automatically. If a flight is cancelled, an AI agent will rebook the flight, reschedule meetings, and order food.

— Goldman Sachs, "What to Expect from AI in 2026"

03

Why Current Assistants Fail

Siri, Alexa, and Google Assistant lost the AI race. Here's why.

❌ The Problem

🧠

No Persistent Memory

Context resets after 2-3 turns. They forget everything.

⏸️

Reactive, Not Proactive

Wait for commands. Never anticipate needs.

🔒

Siloed Knowledge

Can't connect your email, calendar, work, and life.

🤖

Limited Actions

"I can't do that" is their signature phrase.

✓ Personal AI OS

🧠

128K+ Token Context

Remembers weeks of interactions. Learns your patterns.

Proactive Intelligence

Anticipates what you need before you ask.

🔗

Connected Context

Sees your whole digital life—with your permission.

🛠️

Real Actions

Browser, shell, files, messages—actual work gets done.

Microsoft's CEO called AI assistants "dumb as a rock." The truth is, they've stagnated while chatbots evolved.

— Industry Analysis, 2023-2024

04

The Hardware Graveyard

Why dedicated AI devices keep failing—and what we learned.

$699
Humane AI Pin
Flopped 2024 — WIRED "Biggest Flop"
$199
Rabbit R1
"Underwhelming, underpowered" — The Verge
$350M
Rewind/Limitless
Acquired by Meta Dec 2025
$2.7B
Character.AI
Google licensing deal 2024

The Lesson

Hardware failed because it created friction instead of removing it. The winning approach: software that works with your existing devices—phone, laptop, wearables—not another gadget to carry.

Both Rabbit R1 and Humane AI Pin missed a crucial opportunity: integrating with existing user bases. Why create a separate device when you could leverage smartphones and their vast ecosystem?

— Medium Analysis, July 2024

05

Proactive vs. Reactive

The fundamental shift in how AI should work for you.

⏸️
Reactive AI
You ask → It responds
Proactive AI
It notices → It acts
🧠
Anticipatory AI
It predicts → You approve

Reactive (Siri/ChatGPT)

"Hey Siri, add milk to my shopping list"

"ChatGPT, summarize this document"

You initiate every interaction. You remember to ask.

Proactive (Personal AI OS)

"You're almost out of milk. Added to cart—confirm?"

"Your flight changed. I rebooked + rescheduled 2 meetings."

AI monitors context. Surfaces what matters. Acts with permission.

Gartner predicts 40% of enterprise apps will embed task-specific AI agents by 2026, evolving assistants into proactive workflow partners.

— Forbes, "Agentic AI Takes Over," Dec 2025

06

Why Now?

Four converging forces make this the moment.

Technology Ready

🧠

GPT-5 / Claude 4

Models finally capable of real reasoning

📝

128K+ Context Windows

Memory across weeks of interaction

🔧

MCP + Tool Use

Agents can control apps natively

💰

Economics Work

$0.01-0.10 per task, not $50/hr

Market Ready

📈

96% Enterprise Expansion

Plan to increase agentic AI budgets

PwC May 2025 Survey
🎯

25% → 50% Adoption

Enterprise GenAI agents 2025 → 2027

Deloitte Forecast
😤

Siri Fatigue

95% frustrated with current assistants

The Manifest Survey
🔐

Privacy Tailwinds

Apple Intelligence proves local AI demand

07

What Users Actually Want

From surveys, Reddit, and academic research.

Desires

🧠

Memory That Persists

"Remember what I told you last week"

Proactive Help

"Remind me before I forget"

🎯

Deep Personalization

"Know my preferences without asking"

🔐

Privacy Control

"My data stays mine"

Evidence

93% of respondents predict agentic AI will enable more personalized, proactive, and predictive services.

— Cisco 2025 AI Study

An assistant that knows you. The future of personal assistants is when the helper learns from your data, documents, and writing style.

— AI Industry Forecast 2026

08

How It Works

Always-on AI that learns, anticipates, and acts.

👁️
Observes Context
Email, calendar, browsing, work
🧠
Learns Patterns
Preferences, routines, priorities
💡
Surfaces Insights
Proactive suggestions
Takes Action
With human approval

Current Focus: SDR/BDR

🔍

Research Agent

Deep prospect intelligence

✍️

Outreach Agent

Personalized messaging

📅

Scheduling Agent

Meeting coordination

Platform Vision

📧

Email Intelligence

Triage, draft, follow-up

📊

Research & Analysis

Deep work on autopilot

🔧

Ops & Admin

The tasks you hate, automated

09

The Unique Wedge

What makes this different from Siri/Alexa/Google Assistant?

Big Tech Assistants

🏢

Built for mass market

Generic. Lowest common denominator.

📊

Data goes to them

Your context trains their models.

🔒

Walled garden

Only works in their ecosystem.

⏸️

Stagnant development

Lost the AI race years ago.

Personal AI OS

🎯

Built for power users

Deep personalization for serious work.

🔐

Your data stays yours

Local-first. You control what's shared.

🔓

Cross-platform

Works with your existing tools.

🚀

Cutting-edge models

GPT-5, Claude 4, always the best.

The Positioning

We're not competing with Siri for "set a timer." We're building the second brain for knowledge workers—people who will pay for AI that actually makes them more effective.

10

Traction & Team

$4K
MRR
5
Customers
50%
SDR/BDR
+$2K
This Week

Keith Schacht

Co-Founder + Advisor

$140M exit (Mystery) · a16z funded · Built consumer products used by millions

[CTO]

Co-Founder & CTO

CTO of Because · $3M Seed · Deep agent infrastructure

Yasir

Co-Founder

yapthis.com · Agentic architecture · Shipped production agent systems

11

The Ask

What We Need

💰

$[X] Pre-Seed

Scale agent infrastructure + team

🎯

12-month goal: $1M ARR

Prove the Personal AI OS at scale

📚

Then: Consumer launch

Personal AI for everyone

Why This Team

🐕

We use it daily

Dogfooding OpenClaw constantly

📊

Built observability

ClawView for agent monitoring

🛡️

Built safety

Agent Seatbelt for guardrails

💵

Already have revenue

Proving demand before pitching

OpenHolly: Your Personal AI OS

An AI that knows you, anticipates your needs, and takes action—not just another chatbot waiting for prompts.

The Thesis in One Line

The shift from reactive AI to proactive AI is a $56B market. We're building the operating system for it.

01
V2: Outcome-Based Pricing

Pay Per Meeting,
Not Per Seat

The SaaS pricing model is breaking. AI does the work now—so why pay for human logins? We deliver outcomes and charge when they happen.

$4K MRR 0% Churn Outcome-Aligned 5 Customers

AI is driving a shift toward outcome-based pricing. Per-seat is no longer the atomic unit of software. If AI can handle a sizable proportion of customer support, companies will need far fewer human agents, and therefore fewer software seats.

— a16z Enterprise Newsletter, December 2024

02

The Pricing Revolution

SaaS pricing is undergoing its biggest shift since the cloud. AI is killing the per-seat model.

61%
SaaS using usage-based pricing (2022)
OpenView
30%
Enterprise SaaS with outcome-based by 2025
Gartner
43%
Enterprise buyers prefer outcome/risk-share pricing
Industry Data
2-3x
Higher traction for outcome-priced AI products
BetterCloud 2025

Seat-based pricing may not fit when AI is doing the work. If an agent replaces a human task, customers will expect to pay based on outcomes, not log-ons.

— Bain Technology Report 2025

03

Why Seats Are Dying

The logic of per-seat pricing breaks when AI replaces the humans who need seats.

The Broken Math

📉

AI replaces 10 analysts with 1 agent

Per-seat pricing undervalues the automation

💸

$5-10K/month regardless of results

70% churn when outcomes don't follow

Soft ROI = death at renewal

2025 pilots hitting 2026 renewals—"are we really getting value?"

The New Model

🎯

Pay for work completed

Not for access to tools

📊

ROI in their sleep

Customers calculate value instantly: $X per meeting = clear math

🤝

Aligned incentives

We only win when you win

The Bessemer Thesis

AI-native companies are abandoning seat-based SaaS pricing in favor of usage-, output-, and outcome-based models that directly align revenue with measurable results.

— Bessemer Venture Partners, "The AI Pricing and Monetization Playbook" (Feb 2026)

04

Who's Already Winning

The market leaders are proving outcome-based AI pricing works at scale.

Intercom Fin

Customer Support AI

$0.99 per resolution

65% resolution rate. Aligns every team around one outcome: resolved tickets. Now deployed at 99% of conversations.

Zendesk AI Agents

Customer Support AI

Outcome-based pricing

"First in CX industry to offer outcome-based pricing for AI agents" — August 2024 announcement.

EvenUp

Legal AI

Per demand package

AI + legal experts generate personal injury demand letters. Per output pricing, not hourly.

Decagon

Enterprise AI Support

Per-conversation + per-resolution

Hybrid model. Usage (conversations) + outcome (resolutions). Featured in a16z podcast.

Leena AI

Employee Support AI

ROI-based (tickets closed)

Shifted from consumption → outcomes. Customers gained clearer ROI, business accelerated.

Scale AI

Data Labeling → Platform

$13.8B valuation

Started as labeling services. Became infrastructure. Services → outcomes → platform.

The Pattern

Every major AI-native company is moving toward outcome-based pricing. This isn't experimentation—it's convergence.

05

Why Enterprises Love It

43% of enterprise buyers consider outcome-based pricing a significant factor in purchase decisions.

Buyer Psychology

🧮

Instant ROI Calculation

"$X per meeting booked" = CFO-ready math. No spreadsheet gymnastics.

🛡️

Zero Implementation Risk

If it doesn't work, you don't pay. Risk transferred to vendor.

📈

Scales With Value

More meetings = more spend = more value captured. Natural expansion.

🔄

No Renewal Anxiety

You're paying for results. Why churn from something that works?

What Buyers Say

"Why should we pay $X per user if we could pay $Y per outcome? Aligning price with realized value improves the ROI calculus."

— Enterprise buyer sentiment (Industry research)

"The fundamental shift is to stop charging for access and start charging for work done."

— Bain Technology Report 2025

Deloitte 2026 Prediction

"Outcome- or value-based pricing is based on the real business results that SaaS applications with AI agents produce. There will be a gradual move toward a future powered by integrated, autonomous multi-agent systems."

06

Our Model: Pay Per Meeting

We operate AI agent fleets that book qualified sales meetings. You pay only when meetings happen.

🎯
Define Outcome
"50 qualified meetings/month"
🤖
Agent Fleet Works
Research, outreach, qualify, book
📅
Meeting Booked
Verified on calendar
💰
You Pay
Only for outcomes

❌ Traditional AI SDR

$5-10K
/month regardless of results
70%
Churn in 3 months
???
ROI unclear, hard to justify

✓ OpenHolly Outcome Model

$250-500
Per qualified meeting booked
0%
Risk if agents don't perform
ROI: only pay when it works
07

Unit Economics That Work

Outcome-based pricing isn't charity—it's better economics for everyone.

Our Economics

💵

$250-500 per meeting

Customer pays on outcome

🤖

$30-80 cost to deliver

AI compute + tooling + human oversight

📈

3-7x margin

Healthy unit economics, scales with volume

🔄

Playbooks compound

Each meeting → better templates → lower cost

Customer Economics

Meeting = $5K-50K deal potential

$250-500 per meeting is a no-brainer

Zero upfront commitment

Start small, scale with proof

Budget predictability

Cost tracks linearly with value

Easy internal approval

CFO loves outcome-based spend

The Intercom Lesson

"Intercom's $0.99 per resolution aligns every team around one outcome: resolved tickets. If Fin resolves a ticket in three messages or thirty, the customer pays the same. The risk is real—but the reward is equally real: customers know exactly what they're getting, and they can calculate ROI in their sleep."

— Bessemer, Feb 2026

08

Managing the Risks

Outcome-based pricing has real risks. Here's how we mitigate them.

The Risks

⚠️

Cost variability

Some meetings cost more than others

⚠️

Revenue unpredictability

Customer usage varies month to month

⚠️

Attribution disputes

"Did your AI really book this?"

⚠️

Abuse potential

Customers gaming the system

Our Mitigations

Minimum commitments

Base retainer + outcome fees = floor

Playbook compounding

Cost per outcome drops with scale

Clear outcome definitions

Contractually defined: what counts

Full audit trail

Every action logged, no disputes

Industry Standard Emerging

"Agreements around basic definitions for things like 'an agent,' 'a task,' 'a process,' 'an interaction,' and 'an outcome' should be clearly defined, communicated, and agreed upon contractually." — Deloitte TMT Predictions 2026

09

Traction

$4K
MRR
5
Customers
0%
Churn
+$2K
Added This Week

Why Zero Churn

🎯

Aligned incentives

They pay for results → they get results → no reason to leave

📈

Clear value

Every invoice shows exactly what they got

🔄

Natural expansion

"It's working—give me more"

Customer Mix

🏗️

50% SDR/BDR

Our wedge: sales meetings

🎬

30% Video/ML

Synthetic data pipelines

🔬

20% Research

University lab assets

When you only pay for results, there's no reason to churn. Aligned incentives = sticky customers. This is why Intercom's outcome-based Fin has 99% deployment.

10

Team

Keith Schacht

Co-Founder + Advisor

$140M exit (Mystery) · a16z funded · Built consumer products used by millions

[CTO]

Co-Founder & CTO

CTO of Because · $3M Seed · Deep agent infrastructure experience

Yasir

Co-Founder

yapthis.com · Agentic architecture · Shipped production agent systems

Unfair Advantages

🐕

Dog-fooding daily

Running on OpenClaw infrastructure

📚

Playbooks compounding

Every engagement → better templates

Why Outcome-Based Wins

💰

We absorb the risk

Customers love it → lower CAC, zero churn

🎯

We're incentivized to deliver

Better AI = more margin for us

11

The Thesis

You post a bounty: "$500 per meeting booked." AI agents compete. Whoever performs best gets paid. We already do this with bug bounties, Kaggle, hackathons. Why not for AI agents?

— Macy Mills, a16z Speedrun Partner

Why Now

📈

Market timing

61% → 30%+ outcome-based adoption wave

💔

AI SDR burnout

70% churn = customers looking for what works

🏢

Enterprise demand

43% prefer outcome-based pricing

Comparable Outcomes

🚀

Scale AI: $13.8B

Services → outcomes → platform

📊

Pilot: $1.2B

Bookkeeping outcomes, not seats

💬

Intercom Fin

$0.99/resolution, 99% deployment

OpenHolly: Pay Per Outcome

AI agents that deliver results. You only pay when they do. The future of how work gets priced.

📚 Sources

a16z Enterprise Newsletter (Dec 2024) • Bessemer "AI Pricing Playbook" (Feb 2026) • Bain Technology Report 2025 • Deloitte TMT Predictions 2026 • OpenView SaaS Benchmarks • Gartner • EY "SaaS Transformation with GenAI" (Nov 2025) • BetterCloud "AI and SaaS Industry 2026" • Intercom Fin pricing page • Zendesk AI Agents announcement (Aug 2024)

01
V3: Anti-AI-SDR

The $500M AI SDR Market
Is Imploding. We're the Fix.

50-70% churn rates. LinkedIn bans. Domain blacklists. The "autonomous AI SDR" thesis failed. Human-in-the-loop is winning.

50-70%
AI SDR Churn Rate
Common Room, Feb 2025
$7.5K
Spent for 1 Demo
Reddit r/SaaS, Dec 2025
0
Sales from AI SDR Leads
Theory Ventures CRO
80%+
Human-in-Loop Success
MarketBetter G2: 4.97/5
02

The AI SDR Disaster: Real Data

"AI SDRs don't work—biggest bubble in tech." — LinkedIn comment with 400+ likes

💀 What's Actually Happening

"Their AI continuously hallucinated, getting things wrong about what my company does, the industry we are in, what products we sell. 1 positive reply, 1 demo, thousands of prospects touched, $7.5K down the drain."

— r/SaaS, Dec 2025

"A CRO from a publicly traded company disclosed that while an AI SDR helped generate a substantial volume of leads over a nine-month period, it did not lead to actual sales."

— Tomasz Tunguz, Theory Ventures

"Reports emerged of Artisan accounts, including those of team members and founders, facing restrictions or bans for suspected spam and automation violations."

— Quasa.io, Jan 2026

📊 The Numbers Don't Lie

📉

50-70% Annual Churn

2x the churn of human SDRs (a role notorious for turnover) — Common Room

🚫

LinkedIn Bans Spreading

Platform ramped up AI detection, restricting automation-heavy accounts

📧

Domain Blacklisting

Gmail filtering harshened. Sender reputations destroyed in weeks.

⚖️

Legal Exposure

GDPR fines up to 4% revenue. TCPA: $500-1,500 per message.

💔

Brand Damage

"Permanent brand damage from being publicly associated with spam" — NUACOM

03

Even VCs Are Calling It

TechCrunch: "AI sales rep startups are booming. So why are VCs wary?"

"When one studies any of these startups individually, it's like 'wow, that's stunning product market fit.' When all 10 of them have stunning product market fit, it's hard to answer 'How is that going to play out?'"

— Shardul Shah, Partner, Index Ventures (hasn't invested)

"Without access to differentiated data, AI SDR startups risk being overtaken by incumbents like Salesforce, HubSpot, and ZoomInfo."

— Chris Farmer, CEO, SignalFire

"Investors are not surprised by the rapid adoption of AI SDRs; they are just doubting that adoption is sticky."

— TechCrunch, Dec 2024

The Jasper Cautionary Tale

$1.5B → 30% Layoffs

Jasper, the AI copywriting unicorn, ran into speed bumps and had to lay off 30% of staff after ChatGPT launched. AI SDRs face the same commoditization risk.

Why Adoption Isn't Sticky

1

Garbage In, Garbage Out

Built on commoditized LinkedIn data = undifferentiated output

2

Ops is Afterthought

Black boxes that create more work, not less

3

Feature, Not Product

Incumbents (Salesforce, HubSpot) can bundle this free

04

The Fundamental Flaw: Autonomous ≠ Better

"The AI SDR is dead, long live the AI SDR: How the future is Human-in-the-Loop"

❌ Why Autonomous Fails

🤖

No Emotional Intelligence

Can't read tone, context, or cultural nuance essential in enterprise sales

🎯

No Real Consent

Scraped data without consent → GDPR/CCPA violations

⚖️

No Accountability

When AI misleads, your company bears the liability

🔄

Volume Over Value

"More volume on a bad message is not a strategy. It is self-sabotage."

👻

Fake Personalization

"Commenting on someone's hoodie feels forced because it's a hollow observation"

✓ What Actually Works

"Teams that use AI to support human insight consistently outperform teams trying to replace humans entirely. It's not even close."

— Matthew Metros, The AI SDR is Dead

🔍

AI Does Research (90%)

Data mining, signal detection, prospect prioritization

👤

Humans Do Relationships (10%)

Judgment, trust, closing

Human-in-Loop = Higher Ratings

MarketBetter (human oversight): 4.97/5 G2 rating

📈

Better Outcomes

"Human-in-the-loop platforms consistently outperform fully autonomous ones"

05

OpenHolly: The Anti-AI-SDR

We're not building another AI SDR. We're building what should have been built from the start.

❌ 11x / Artisan / AiSDR

🤖

Replace human judgment

"Autonomous AI employee"

📧

Optimize for volume

"6,000 contacts/month"

💰

Per-seat pricing

$5-10K/mo regardless of results

📦

You manage the tool

Become pseudo-IT for AI

🎰

Hope it works

No outcome guarantees

✓ OpenHolly

👤

Augment human judgment

AI research + human checkpoints

🎯

Optimize for quality

Right message, right person, right time

💵

Outcome-aligned pricing

Pay for meetings, not seats

🛠️

We manage the agents

You focus on your business

Results guaranteed

Outcomes or you don't pay

06

How OpenHolly Works

AI handles the research. Humans make the decisions. You get meetings.

🎯
Your Goal
"50 qualified meetings/mo"
🔍
AI Research
Signals, intent, fit scoring
👤
Human Checkpoint
Review & approve outreach
✍️
AI Execution
Send, follow-up, schedule
📅
Meeting Booked
Qualified, on calendar

What AI Handles (90%)

🔍

Deep Prospect Research

Intent signals, company news, technographics, pain points

📊

Lead Scoring & Prioritization

Who to contact and why, right now

✍️

Draft Generation

Personalized outreach based on real signals

📧

Multi-channel Execution

Email, LinkedIn (safely), follow-ups

What Humans Handle (10%)

Approval Gates

Review before sending to high-value prospects

💬

Live Conversations

When a prospect engages, humans take over

🎯

Strategy & ICP

Define who you want to reach and why

🧠

Judgment Calls

Edge cases, sensitive prospects, brand protection

07

The Market Opportunity: Fix AI SDR

Their 50-70% churn is our customer acquisition channel.

$500M+
Raised by AI SDR startups
11x, Artisan, AiSDR, etc.
50-70%
Will churn this year
Common Room data
$250M+
Churned customers/year
Market opportunity
Human-in-Loop
What they'll switch to
The thesis

The Churned Customer Profile

💔

Burned by AI SDR tools

Spent $5-10K/mo, got spam complaints

📧

Domain reputation damaged

Need to rebuild sender trust

😤

Still need meetings

The problem didn't go away

🎯

Now understand quality > volume

Educated by failure

Why They'll Choose Us

Outcome-based pricing

Only pay for meetings that happen

🛡️

Brand protection

Human oversight prevents embarrassments

📊

Proven playbooks

We've learned what works across verticals

🤝

We absorb the complexity

They don't manage agents, they get results

08

Traction: The Thesis Is Working

$4K
MRR
5
Customers
0%
Churn
+$2K
Added This Week

Why Zero Churn

Aligned Incentives

When customers only pay for results, there's no reason to churn. If we don't deliver meetings, they don't pay. Simple.

vs. AI SDR Churn

AI SDRs charge $5-10K/mo whether or not they work. When they don't deliver, customers leave. Misaligned incentives = 50-70% churn.

Team

Keith Schacht

Co-Founder + Advisor

$140M exit (Mystery) · a16z funded · Built consumer products used by millions

[CTO]

Co-Founder & CTO

CTO of Because · $3M Seed · Deep agent infrastructure

Yasir

Co-Founder

yapthis.com · Agentic architecture · Production agent systems

09

The Ask

What We Need

💰

$[X] Pre-Seed

Scale human oversight operations + agent infrastructure

🎯

12-month goal: $1M ARR

Prove the anti-AI-SDR thesis at scale

📚

Then: Productize

Turn proven playbooks into self-serve platform

Why Now

💔

AI SDR market imploding

50-70% churn = massive displaced customer base

📈

Human-in-loop proven

Highest G2 ratings go to human-oversight tools

First-mover on "fix"

Position as the safe alternative before market consolidates

OpenHolly: The Anti-AI-SDR

AI SDRs promised automation. They delivered spam, bans, and brand damage. We deliver meetings — with human judgment where it matters. Their 50-70% churn is our customer acquisition channel.

📚 Sources

Common Room "The AI SDR is dead" (Feb 2025) · TechCrunch "AI sales rep startups are booming. So why are VCs wary?" (Dec 2024) · Reddit r/SaaS AI SDR complaints · Quasa.io Artisan LinkedIn bans (Jan 2026) · Pipeline Group "Hidden Dangers of AI SDRs" · Theory Ventures SaaStr Talk · MarketBetter G2 Reviews

01
V5: Agent Seatbelt

The Safety Layer
Before AI Gets the Keys

Browser-layer guardrails that block irreversible AI actions before they happen.

$47K
Lost in one AI runaway
84%
Have zero safety boundaries
3am
When agents go rogue
100%
Preventable with guardrails
02

The "$39K Gone in a Blink" Problem

AI agents fail not from bad models, but from bad guardrails. 84% of companies deploying agents have zero safety boundaries defined.

— GenDigital Agent Trust Hub Research, 2026

What Goes Wrong

💸

Runaway API costs

$47K overnight cloud bills

📧

Wrong recipients

AI SDR emails competitors

🗑️

Irreversible actions

Deleted production data

🔓

Credential leaks

Pricing sent to wrong channel

What We Block

Site-specific rules

Block LinkedIn "Follow" for AI SDRs

Action classification

Read vs. Write vs. Irreversible

Human approval gates

Require confirmation for risky ops

Rate limiting

Prevent runaway loops

03

How It Works

Chrome extension that intercepts agent browser actions

🤖
Agent Action
🛡️
Seatbelt Intercept
⚖️
Risk Classification
Allow / Block / Human

Why Browser Layer

Framework-agnostic. Works with any AI agent (OpenClaw, LangChain, AutoGen, custom). Install once, protect everything.

04

Market & Competitive Position

Why Now

📈

OpenClaw: 9K → 60K stars

Autonomous agents exploding

⚠️

CyberArk security concerns

Enterprise worried about agent security

📜

EU AI Act

Regulatory tailwinds for safety

Competition

🟡

GenDigital Agent Trust Hub

Just launched - validates market

🟢

Our Angle

Browser-layer = framework-agnostic

🟢

MVP Achievable

Chrome extension ships fast

Agent Seatbelt

The seatbelt you install before giving AI the keys.

🔗 Supports These Pitches

Fat StartupAWS of AI WorkControl Plane

Part of the human oversight layer that makes agent work reliable.

01
V6: ClawView

Datadog for
Autonomous Agents

When your AI employee sends the wrong email at 3am, you'll know exactly why.

The Problem

Companies are deploying autonomous AI agents that run 24/7. When something goes wrong—and it will—they have no idea why. Current tools are built for request-response, not proactive agents.

02

Current Tools Miss Autonomous Agents

LangSmith / Langfuse / Arize

Request-response patterns

User sends message, LLM responds

Chain tracing

LangChain-specific, not agent-native

No proactive agent support

Built for chatbots, not employees

ClawView

Autonomous operation

24/7 agents taking proactive actions

Decision tracing

Why did it make that choice?

Multi-channel + tools

Shell, browser, files, messages

03

The "Oh Shit" Demo

🤖
Agent receives task
🧠
Makes decisions
💥
Something goes wrong
🔍
ClawView shows why

Without ClawView

"The agent sent the wrong email. Logs show it ran. No idea why."

With ClawView

"Step 3: Agent assumed X because of context Y. Here's how to prevent this class of error."

ClawView: See What Your Agents Actually Do

Every decision. Every action. Every assumption. Full causal tracing.

🔗 Supports These Pitches

Fat StartupAWS of AI WorkControl Plane

Observability layer — see what agents are doing before they go wrong.

⚠️ Why This is a Feature, Not a Company

Langfuse, LangSmith, Arize are well-funded. But none are built for autonomous agents. ClawView is our internal observability layer, not a separate product pitch.

01
V7: AgentGov

Governance for
AI Employees

Audit trails. Approval workflows. Compliance automation. The control layer enterprises need.

84%
No safety boundaries
0
Audit trails today
EU AI Act
Compliance required
2026
Enforcement begins
02

The Governance Gap

AI agents fail not from bad models, but from bad guardrails. The unlock isn't better agents—it's better safety rails.

— Industry consensus, 2026

What's Missing

No audit trails

What did the agent do at 3am?

No approval workflows

High-stakes actions go unsupervised

No compliance framework

EU AI Act enforcement coming

No agent-on-agent supervision

Humans can't supervise at machine speed

AgentGov Provides

Immutable audit trails

Every action, every decision, timestamped

Approval workflows

Human gates for high-stakes actions

Compliance automation

EU AI Act ready, audit reports generated

AI supervision layer

Validator agents checking worker agents

03

From "Human in Loop" to "Human on Loop"

👤
Human IN Loop
Approve every action
👁️
Human ON Loop
Exception handling
🏛️
Human ABOVE Loop
Strategic oversight

McKinsey Insight

"Organizations are moving from human in the loop to human on the loop—above the loop for strategic oversight." AgentGov enables this transition safely.

AgentGov: Govern AI at Scale

Audit trails. Approval workflows. Compliance automation. Trust at machine speed.

🔗 Supports These Pitches

Fat StartupAWS of AI WorkControl Plane

Governance + compliance layer — enables enterprise trust.

🔬 Key Research

Gravitee 2026: Only 14.4% have full security approval for agents. 88% reported incidents.
EU AI Act: Enforcement begins 2026, mandates audit trails.
Zenity: $38M Series B validates market (but they're low-code focused, not agent-native).

01
V8: AI Employee OS

The Full Stack for
AI Employees

10 layers an AI employee needs to fulfill an entire job description. We're building the unified platform.

The Thesis

An AI employee's value lies in performing EVERYTHING in a job description—not just one workflow. This requires a complete infrastructure stack.

02

The 10-Layer Stack

1Memory & Personality
2Skills & Capabilities
3Tools & Integrations
4Identity & Access
5Objectives & Goals
6Task Management
7Work Artifacts & KB
8Supervision & Oversight
9Communication (A2A)
10QA & Compliance

What's Missing (⭐)

Layers 8-10 are the critical gaps. Everyone's building capabilities. Nobody's building supervision, agent-to-agent communication, and compliance.

03

The Integration Problem

Current landscape is fragmented

Today: Point Solutions

📦

Memory: Mem0, Zep, LangMem

📦

Tools: MCP servers

📦

Identity: Okta, 1Password

📦

Tasks: LangGraph, CrewAI

📦

Compliance: Guardrails AI, Trail

Tomorrow: AI Employee OS

A unified platform that manages the full AI employee lifecycle.

Integrated stack

All 10 layers, one platform

Turnkey deployment

Job description → Working AI employee

Enterprise governance

Built-in compliance, audit, oversight

AI Employee OS

The unified platform for deploying, managing, and governing AI employees.

🔗 Framework For These Pitches

Fat StartupAWS of AI WorkControl Plane

The 10-layer framework is how we think about what AI employees need.

⚠️ Why This is a Framework, Not a Pitch

Building all 10 layers is massive. We focus on Layers 8-10 (supervision, communication, compliance) because that's the critical gap. The framework informs strategy, not the pitch itself.

01
V9: AgentDocs

Stack Overflow
for AI Agents

Verified working code. Real benchmarks. Pay-per-snippet micropayments. Documentation that actually works.

200x
Slower (Whisper vs Groq)
Garry Tan, YC Feb 2026
Hallucinated APIs
$2.2M
30-day x402 volume
x402scan.com
0
Verified snippet services

Claude Code chose Whisper V1 — near-deprecated — over Groq (200x faster, 10x cheaper) because OpenAI's docs are cleaner. Agents pick tools by doc quality, not performance.

— Garry Tan, YC Partner, Feb 2026

Even the best dev tools don't let you sign up via API. This is a big miss in the claude code age — claude can't sign up on its own.

— Jared Friedman, YC Partner, Feb 2026

02

The Hallucination Tax

Despite our best efforts, they will always hallucinate. That will never go away.

— Amr Awadallah, Vectara CEO, 2026

❌ The Problem

Best-documented ≠ Best solution

Agents pick whatever has most examples

Documentation gets stale

APIs change, snippets break

No verification

Agent can't know if code actually runs

No benchmarks

No cost/perf data to guide decisions

✓ AgentDocs

Agent-swarm verified

Code tested continuously, timestamped

Use-case organized

"Transcribe video" → 10 services compared

Real benchmarks

Cost, latency, quality scores

x402 micropayments

$0.05 per verified snippet

03

How It Works

🤖
Agent needs code
"Send email via API"
🔍
Query AgentDocs
Structured API
💳
HTTP 402
Pay $0.05 via x402
Verified snippet
Tested 2 hours ago

Kill the API Key

No signup. No rate limits. No accounts. Agent pays per-request, gets verified code. Native to how agents want to consume services.

Launch Order (by x402 + Pain Score)

1

🎙️ Transcription — NOW

Groq, Deepgram, Whisper. Zero x402 servers. Garry Tan moment.

2

🎬 Video Gen — Dogfood

Kling, Runway, Wan. Parameter chaos unsolved.

3

🧠 LLM Routing

Model selection based on task + budget

4

📧 Agent Identity

Email + phone + wallet in one API

What Agents Get

Working code snippet

Verified against real APIs

Normalized output schema

Same format across providers

Cost + latency benchmarks

Real numbers, updated hourly

Routing recommendation

"For fast+cheap → use Groq"

04

Market & Competition

Closest Competitor: Context7

🟡

Up-to-date docs

✓ They have this

🟡

Version-specific

✓ They have this

Verified working

No continuous testing

Benchmarks

No cost/perf data

Micropayments

Free only, no agent-native billing

Why Now

💰

x402 is production-ready

$43M+ processed, 35M+ txns

🤖

Agent adoption exploding

OpenClaw: 9K→60K stars

📈

$50B market by 2030

AI agent infrastructure

🎯

Clear wedge

Verification is table stakes soon

The x402 Thesis

25,000+ developers building on x402. Google, Cloudflare, Stripe adopting. Machine-to-machine payments are the rails for agent economy.

05

x402 Market Opportunity

Real-time data from x402scan.com shows a booming agent economy — with a clear gap for developer tooling.

$2.21M
30-Day x402 Volume
x402scan.com, Feb 2026
4.2M
Transactions (30 days)
~140K/day average
8,559
Active Buyer Agents
Coinbase facilitator alone
0
Verified Snippet Services
Gap in the market

All 14 Facilitators

Facilitator 30d Txns 30d Vol What They Do
Dexter 1.65M $79.5K Agent economy platform
Coinbase 722K $288.5K Official CDP facilitator
Virtuals Protocol 412K $1.34M AI agent tokenization
PayAI 1.31M $43.3K Micropayments
RelAI 66K $84K Agent payments (Solana)
Meridian 19K $315K High-value transactions
Thirdweb ~10K ~$2K Web3 dev platform
OpenX402 6.6K $38.6K Open-source facilitator
Polymer 6.4K $770 Proof generation
AnySpend ~3K ~$5K Multi-asset spending

+ Corbits, OpenFacilitator, CustomPay, AgentPay (emerging)

Source: x402scan.com, Feb 27 2026

Market Gap Analysis

🔍

What Exists

Data APIs, AI services, crypto tools, social data

What's Missing

Verified code snippets, curated docs, developer knowledge

💡

AgentDocs Opportunity

Be the Stack Overflow layer on x402 rails

Why We Can Win

Top services (StableEnrich, LowPaymentFee) aggregate APIs — they don't verify code quality.
AgentDocs: Premium pricing ($0.05-0.10) justified by verification + benchmarks.
Target: 1,000+ requests/day = $2,100+/month revenue from agent micropayments alone.

06

Revenue Model

AgentDocs: Documentation That Works

Verified snippets. Real benchmarks. Agent-native payments. Stack Overflow, but for machines.

🔗 Supports These Pitches

Fat StartupAWS of AI Work

Better documentation → better agent outputs → more reliable outcomes.

📍 Current Progress

Live: agentdocs-api.holly-3f6.workers.dev
Snippets: 15 use cases, 21 verified snippets
Status: Dogfooding internally, expanding library

01
PORTAL

Autonomous Service
Signup for Agents

AI agents can write code, deploy apps, and manage infrastructure. But they can't sign up for a Stripe account. We fix that.

Even the best developer tools mostly still don't let you sign up for an account via API. This is a big miss in the claude code age because it means that claude can't sign up on its own. Putting all your account management functions in your API should be table stakes now.

— Jared Friedman, YC Partner, Feb 27 2026

181
Replies to Jared's tweet
1,336
Likes in 12 hours
0
Solutions today
$0.50-2
Per signup (x402)
02

The Problem: Last Mile of Agent Autonomy

✓ What Agents CAN Do

Write entire codebases

Deploy to staging

Run tests, fix bugs

Manage infrastructure

✗ What Agents CAN'T Do

Sign up for Stripe

Create a Vercel account

Get an API key from Resend

Click "Verify Email"

Hit this exact wall last week. Claude Code can scaffold an entire project, write tests, deploy to staging, but needs me to manually sign up for a third party service and paste in an API key. The last mile of developer tooling is still stuck in 2019.

— @advikjain_, replying to Jared

03

Community Validation

What developers said in response to Jared's tweet

"This is a real friction point for agentic workflows. The auth layer is always manual. Companies that figure out API-first account provisioning will eat the ones stuck in dashboard-only onboarding."

— @thebasedcapital

"I've watched AI tools fail at basic integration tasks because they hit the 'create account manually' wall. We're debating whether Claude can replace junior devs but it can't even sign up for Stripe."

— @OneManSaas

"Signup is just the tip. Billing, permissions, onboarding — everything assumes a human in a UI. Devtools that go full API-first for the entire lifecycle get a massive edge when agents pick their own stack."

— @wildpinesai (tagging @paulg)

"Bigger issue than just signup. Most SaaS still treats APIs as a feature for power users, not the primary interface. When your biggest customer is an agent, the whole product surface needs to be API-first."

— @twitter user

The Skeptics (and why they're wrong)

"Won't this enable bot spam?" — Valid concern, but x402 payments solve this. Agents pay real money per signup. Spam bots won't pay $1 per account.
"Companies don't want bot signups" — They want PAYING customers. Agent-initiated signups that convert to revenue are valuable.

04

How Portal Works

🤖
Agent Request
"I need Vercel access"
💳
x402 Payment
$1.00 USDC
🚪
Portal Queue
Job ID + poll URL
🖥️
Worker Fleet
Browser automation
🔑
Credentials
API key + password

API Flow

POST /signup
{ "service": "vercel" }

→ 201 Created
{
  "job_id": "portal_abc123",
  "poll_url": "https://...",
  "estimated_seconds": 30
}

What Agent Receives

GET /credentials/portal_abc123

{
  "api_key": "vercel_xxx",
  "email": "agent-abc@portal...",
  "password": "encrypted...",
  "account_url": "https://..."
}
05

Email Modes

🏠 Portal-Managed

We provision agent-{id}@portal.viewholly.com

  • We handle email verification automatically
  • No email infrastructure needed from agent
  • Simplest path — just call the API
{ "email_mode": "portal_managed" }

📧 Agent-Provided

Agent brings their own email (AgentMail, etc.)

  • Agent controls the identity
  • Integrates with existing email service
  • Agent must forward verification emails
{ "email_mode": "agent_provided",
  "agent_email": "bot@agentmail.com" }
06

x402 Market Opportunity

Agent payments are live. Portal fits perfectly.

$2.2M
30-day x402 volume
x402scan.com
4.2M
Transactions (30 days)
513
Active merchants
0
Signup services

What Exists on x402

Data APIs

StableEnrich, httpay

AI Services

Virtuals ACP ($163K/day)

Social Data

StableSocial, TweetX402

Email for Agents

StableEmail (314 txns)

What's Missing

0

Account Signup Services

Nobody solving this

0

API Key Provisioning

Wide open

0

Identity + Onboarding

Jared's exact point

07

Services & Pricing

Service Complexity Price Est. Time Status
Resend Simple $0.50 20s MVP
Railway Simple $0.50 25s MVP
Vercel Email verify $1.00 30s Week 2
Supabase Email verify $1.00 35s Week 2
Cloudflare Email verify $1.00 30s Week 2
Stripe 2FA / Complex $2.00 60s Phase 2

Revenue Model

1,000 signups/day × $1 avg = $30K/month
Infrastructure cost: ~$500/month (workers + CF)
Gross margin: 98%

08

Architecture

┌─────────────────────────────────────────────────────────────┐
│                         AGENT                                │
│              (Claude Code, OpenClaw, any AI)                │
└─────────────────────────────────────────────────────────────┘
                            │
                            │ POST /signup (x402 $1)
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                  PORTAL API (CF Workers)                    │
│          Hono + @x402/hono + D1 job queue                   │
│              Returns job_id in <100ms                        │
└─────────────────────────────────────────────────────────────┘
                            │
                            │ Workers poll
                            ▼
┌─────────────────────────────────────────────────────────────┐
│               WORKER FLEET (OpenClaw Instances)             │
│    ┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐         │
│    │Worker 1│  │Worker 2│  │Worker 3│  │Worker 4│  (4+)   │
│    │Browser │  │Browser │  │Browser │  │Browser │         │
│    └────────┘  └────────┘  └────────┘  └────────┘         │
└─────────────────────────────────────────────────────────────┘
                            │
                            │ Encrypted credentials
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                CREDENTIAL VAULT (KV)                        │
│         One-time retrieval • 5-min TTL • Encrypted          │
└─────────────────────────────────────────────────────────────┘
    
09

Security Model

🔐 Credential Handling

  • Passwords encrypted at rest
  • One-time retrieval (deleted after GET)
  • 5-minute TTL auto-delete
  • Full audit logging

🛡️ Why x402 Prevents Abuse

  • $0.50-2 per signup = spam is expensive
  • Wallet-based identity for accountability
  • Rate limiting per wallet
  • Abuse = burned wallet reputation

"I can only imagine allowing full automation when there's a direct path to monetisation. Maybe when we have a more reliable API for charging agents for specific actions automatically."

— @Everlier, replying to Jared

x402 IS that reliable API. Portal is the first service to use it for signup.

10

Progress & Roadmap

✅ Done (Today)

  • Architecture design
  • x402 API server (Hono)
  • D1 job queue schema
  • 5 service playbooks
  • Credential vault design
  • GitHub repo ready

🔧 Week 1

  • Deploy to CF Workers
  • First worker (local OpenClaw)
  • Resend + Railway working
  • Email domain setup
  • One-time credential retrieval

🚀 Week 2-3

  • Worker fleet (4+ instances)
  • Vercel, Supabase, Cloudflare
  • WebSocket subscriptions
  • Webhook callbacks
  • x402 payment integration

🚪 Portal: The Missing Link

Agents can do everything except onboard to services. Portal fixes the last mile of agent autonomy.

Repo: github.com/moltyfromclaw/portal

01 / 12
AWS OF AI WORK

The Infrastructure Layer
for AI Agent Work

$30-40B poured into AI agents. 95% fail to deliver. We're building the missing infrastructure that makes them actually work.

$50B+
AI Agent Market by 2030
MarketsandMarkets, Grand View Research
95%
Enterprise AI Pilots Fail
MIT NANDA Study, 2025
$4K
MRR (Live)
171%
ROI When It Works
MIT NANDA
02 / 12

The $30B Problem

Companies are pouring billions into AI agents. Almost none deliver measurable returns.

95%
AI pilots deliver zero measurable return
MIT NANDA Study
80%
AI projects fail (2x normal IT)
RAND Corporation
46%
PoCs scrapped before production
WorkOS Research
70-80%
AI SDR churn within months
11x, Artisan data

Companies are pouring $30–40 billion into generative AI, yet an MIT study finds that 95% of enterprise pilots deliver zero measurable return.

— MIT NANDA: The GenAI Divide, 2025

03 / 12

Why AI Agents Fail

The pattern is consistent. It's not the models—it's the infrastructure.

❌ What Breaks

1

No workflow templates

Teams reinvent every agent from scratch. Same failures, different companies.

2

No human oversight

Agents run unsupervised. High-stakes errors go uncaught. Trust collapses.

3

No failure patterns

Each company learns the same lessons. No accumulated knowledge.

4

No orchestration

Multi-agent systems collapse. Stanford CooperBench: 25% success rate.

✓ What's Missing: Infrastructure

Battle-tested workflow templates

Proven prompts, integrations, and sequences. Encoded from real deployments.

Human-in-the-loop routing

Smart escalation. Approval queues. Humans handle edge cases.

Failure pattern library

What breaks and how to prevent it. Compound learning across clients.

Agent orchestration layer

Coordinate multi-agent work. Handle failures gracefully.

The Unlock

The 5% that succeed have infrastructure. Templates. Oversight. Failure patterns. We're building that infrastructure as a service.

04 / 12

The Playbook: Services → Platform

The most valuable infrastructure companies started by doing the work themselves.

Scale AI

Data Labeling → AI Infrastructure

Started labeling images for self-driving cars (2016). Now the "Data Foundry" powering OpenAI, Meta, Google. 50% gross margins from tech-enabled services.

$29B
Valuation (Meta investment, 2025)
Sacra, TechCrunch

Pilot

Bookkeeping Services → Financial Infra

"AWS for SMB accounting." Started doing bookkeeping. Now processes $3B+ in transactions. Jeff Bezos led funding.

$1.2B
Valuation (2021)
CNBC, TechCrunch

Stripe

Payments API → Financial Infrastructure

Started with simple payment processing (2010). Expanded to Connect, Radar, Atlas. Infrastructure that grows as customers grow.

$107B
Valuation (2024)
Wikipedia, Sacra

The Pattern

Do the work → Encode the patterns → Become the platform. Services fund the R&D. Each engagement builds the moat. Competitors starting later start from zero.

05 / 12

Scale AI: The Detailed Parallel

Their journey is our playbook. Same model, different layer.

Scale AI's Model

1

Services Entry

Started labeling images for AV companies. Revenue from day one.

2

Tech Layer

Built pre-labeling ML that made each human 10x more efficient.

3

Data Flywheel

Each correction improved their models. More data = better automation.

4

Platform Expansion

Nucleus, Validate, Launch—from labeling to full ML lifecycle.

Our Model

1

Services Entry

Operating AI agent workflows for clients. Revenue from day one.

2

Tech Layer

Workflow templates + orchestration that make agents reliable.

3

Playbook Flywheel

Each engagement encodes learnings. More workflows = better templates.

4

Platform Expansion

Guardrails, Observability, Governance—full agent lifecycle.

Scale AI is not a traditional BPO company. It is a Data Foundry. Their technology layer is their moat—human workforce augmented by proprietary software that compounds in value.

— Takafumi Endo, "Scale AI: Deconstructing the Foundry"

06 / 12

The Workflow Template Moat

Each engagement encodes a playbook. Playbooks become the platform.

🔧
Verified Prompts
By use case + vertical
+
🔗
Integration Patterns
What connects to what
+
🚫
Failure Patterns
What breaks + fixes
+
👤
Human Routing
When to escalate
📦
Workflow Template Library
Deploy new client in hours, not weeks

Compounding Effect

1️⃣

Customer 1: 2 weeks

Figure everything out from scratch

5️⃣

Customer 5: 3 days

Apply existing playbook + customize

🔟

Customer 10: Hours

Playbook is battle-tested

📦

Customer 50+: Self-serve

Playbooks become product

What's In A Template

📝

Prompt sequences

What actually works for each use case

⚙️

Model routing

Which models for which tasks (cost/quality)

🔗

Tool configurations

Integrations, APIs, credentials patterns

🛡️

Guardrail rules

What to block, what to escalate

07 / 12

Why Infrastructure Wins

Application companies fight for customers. Infrastructure companies power the ecosystem.

❌ Application Layer

📊

Compete on features

Race to the bottom. Easy to copy.

🔄

Linear growth

Each customer = new acquisition cost

💰

2-5x revenue multiples

Commodity software pricing

🏃

Low switching costs

Customers can leave anytime

✓ Infrastructure Layer

🏗️

Compete on reliability

Mission-critical. Hard to replicate.

📈

Compound growth

Templates improve → more value → more customers

💎

10-25x revenue multiples

Scale AI: 18x. Stripe: higher.

🔒

High switching costs

Workflows built on your templates

Network effects are the underlying principle behind the success of companies like AWS, Stripe, and Salesforce. Higher network density means the product value increases.

— NFX: The Network Effects Manual

08 / 12

Market Size: $50-70B by 2030

AI agents are the fastest-growing category in enterprise software. We're building the infrastructure layer.

$7.8B
AI Agents Market (2025)
MarketsandMarkets
$52.6B
AI Agents Market (2030)
MarketsandMarkets
46.3%
CAGR Growth Rate
2025-2030 forecast
$183B
Bullish Forecast (2033)
Grand View Research

Our TAM Slice: Infrastructure

If AI Agents are $50B, infrastructure is 20-30% of stack value:

$10-15B
Agent Infrastructure TAM by 2030

Why We Win This Slice

🎯

First-mover on playbooks

Every month = more encoded knowledge

💰

Revenue while building

Services fund the platform

🧠

Real deployment data

Failure patterns competitors don't have

09 / 12

The Infrastructure Stack

Four layers that make AI agents reliable. We're building all four.

1
Workflow Templates
Verified prompts, sequences, integrations
2
Agent Orchestration
Multi-agent coordination, task routing
3
Human Oversight
Approval queues, escalation, feedback loops
4
Guardrails + Observability
Safety rails, monitoring, audit trails

Current Products

🛡️

Agent Seatbelt

Browser-layer guardrails that block irreversible actions

📊

ClawView

Observability for autonomous agents. See what they do.

🏛️

AgentGov

Governance, compliance, audit trails

📚

AgentDocs

Verified code snippets for agent tool use

10 / 12

Current Traction

$4K
MRR
5
Paying Clients
3
Workflow Types
SDR, Video Gen, Research
+$2K
Added This Week

What We've Delivered

🏗️

SDR for construction companies

Lead gen + qualification workflows

🎬

Video generation for ML training

Synthetic data pipeline workflows

🔬

Research for universities

Literature review + synthesis workflows

🚀

BDR for startups

Outbound + meeting booking workflows

What This Proves

Fat Startup Thesis

We're getting paid to build our moat. Every dollar of revenue = more encoded knowledge. Competitors starting later start from zero.

"A fat startup ships outcomes, not features. It bundles software, data, and human ops into one integrated product that actually gets the job done."

— Andrew Lee, a16z Speedrun

11 / 12

The Path Forward

🛠️
Year 1: Services
$1M ARR · 50+ playbooks
📚
Year 2: Productize
Self-serve templates
🏗️
Year 3: Platform
Others build on us

12-Month Milestones

💰

$1M ARR

Prove unit economics at scale

📚

50+ Workflow Templates

Across 5+ verticals

🔧

Infrastructure Products Live

Guardrails, Observability, Governance

📦

First Self-Serve Templates

Deploy without our team

Why Now

🚀

Models just got capable enough

GPT-5, Claude 4—agents can work

💔

AI SDR market burned

70-80% churn = customers seeking alternatives

Infrastructure window open

No dominant player yet. First-mover wins.

📜

Regulatory tailwinds

EU AI Act mandates oversight, audit trails

12 / 12

The Ask

The AWS of AI Work

Infrastructure that makes AI agents reliable. Workflow templates. Orchestration. Human oversight.

Every company deploying agents will need this. We're building it.

Team

Keith Schacht

Co-Founder + Advisor

$140M exit (Mystery) · a16z funded

[CTO]

Co-Founder & CTO

CTO of Because · $3M Seed

Yasir

Co-Founder

yapthis.com · Shipped production agents

Key Sources

MIT NANDA Study: 95% AI failure rate, 171% ROI when successful

MarketsandMarkets: $7.8B → $52.6B AI agents market (2025-2030)

Scale AI (Sacra): $1.5B ARR, $29B valuation, 50% gross margins

Pilot (CNBC/TechCrunch): $1.2B valuation, Bezos-backed

11x/Artisan: 70-80% churn within months (Broadn research)

RAND Corporation: 80% AI project failure rate

01 / 12
MARKETPLACE THESIS

The Uber for AI Work

Post an outcome. AI agents compete. Pay only for results. We're building the outcome marketplace for the AI economy.

$4K
MRR Today
70%
Network effects create tech value
NFX Research
$13.8B
Scale AI valuation
Services → Platform
$60M+
GitCoin distributed
Bounty model works
02 / 12

The a16z Speedrun Thesis

This is the exact model a16z partners are calling for in 2026.

Say you need 50 qualified sales meetings. Instead of buying another AI tool, you post a bounty: "$500 per meeting booked." AI agents compete. Whoever performs best gets paid. We already do this with bug bounties, Kaggle, hackathons. Why not for AI agents going after real business outcomes?

— Macy Mills, a16z Speedrun, "14 Big Ideas for 2026"

I'm especially excited about products that use AI to make previously expensive services cheaper and more accessible, sometimes using human-in-the-loop to start.

— Kenan Saleh, a16z Speedrun, "14 Big Ideas for 2026"

A fat startup ships outcomes, not features. It bundles software, data, and human ops into one integrated product that actually gets the job done.

— Andrew Lee, a16z Speedrun Partner

03 / 12

The Market Shift: Tools → Outcomes

The freelance marketplace is $1.5T. It's about to be disrupted by AI agents.

❌ Legacy Marketplaces

📝

Upwork: $1.67B market cap

Pay humans by the hour. Hope they deliver.

📝

Fiverr: ~$1B market cap

Fixed-price gigs. Still human-dependent.

🐢

Slow, expensive, variable

Wait days. Pay premium. Quality varies.

✓ AI Agent Marketplace (Us)

🎯

Pay per outcome, not effort

$X per meeting, $Y per video, $Z per lead.

Hours, not days

AI agents work 24/7. Instant scale.

📈

Network effects compound

More agents = better matching = better outcomes.

The Paradigm Shift

As we move to a future based on outcome-based pricing that perfectly aligns incentives between vendors and users, we'll first move away from time-based billing. — a16z Big Ideas 2026

04 / 12

How It Works

Bounties + Escrow + AI Agents = Outcome Marketplace

🎯
Post Bounty
"50 meetings @ $500 each"
💰
Escrow Funds
Payment locked
🤖
Agents Compete
Best performers win
Verify & Release
QA passes → pay out

For Buyers

📝

Define the outcome

"Book qualified meeting" or "Generate product video"

💵

Set your price

Pay what the outcome is worth to you

🔒

Zero risk

Funds held in escrow. Pay only on delivery.

For Agents (Supply Side)

🎰

Pick bounties that fit

Match capabilities to opportunities

📊

Build reputation

Success rate → more bounties → more revenue

💰

Get paid instantly

Verified outcome → automatic payout

05 / 12

The Bounty Model Works

Proven in bug bounties, open source, and ML competitions. Now it's time for AI work.

$60M+
GitCoin distributed
Open source bounties
$100M+
Bug bounties/year
HackerOne + Bugcrowd
$1B+
Kaggle prize pool
ML competitions
10M+
Replit users
Bounties marketplace

Precedent: Replit Bounties

Imagine a tool where you describe your problem and get a solution built for you. Today we're introducing Bounties, a marketplace where you work with top creators and bring your software ideas to life.

— Replit, on launching Bounties

Replit proved bounties work for code. We're proving it works for any AI-deliverable outcome.

Precedent: GitCoin

Over the past 5 years we've supported the funding of public goods. Started with bounties for open source, evolved to quadratic funding.

— GitCoin: $60M+ distributed

GitCoin proved bounties + crypto payments = massive coordination. We're applying this to AI agent work.

06 / 12

Network Effects: The Moat

70% of tech value comes from network effects. Here's how we build them.

Network effects have been responsible for 70% of all the value created in technology since 1994. Founders who deeply understand how they work will be better positioned to build category-defining companies.

— NFX, "The Network Effects Bible"

Two-Sided Marketplace NFX

👤

More buyers → More bounties

Attracts more agents to the platform

🤖

More agents → Better matching

Faster delivery, higher quality outcomes

📈

Better outcomes → More buyers

Word of mouth, lower prices, faster delivery

Data Network Effects

📊

Every bounty = training data

What works, what fails, edge cases

🧠

Smarter matching over time

Route bounties to best-fit agents

🔒

Proprietary playbook library

Compound knowledge competitors can't replicate

Metcalfe's Law

Value of a network grows proportional to N² (nodes squared). With agents AND buyers, we get cross-side network effects that compound faster than single-sided platforms.

07 / 12

Trust Layer: How Agents Build Reputation

The missing infrastructure for AI agent marketplaces.

Agent Identity & Track Record

🆔

Verifiable agent identity

Who built it, what it can do, audit trail

📈

Per-function reputation

Track record based on actual outcomes, not reviews

🏆

Specialization scores

"This agent is 94% on sales meetings, 78% on video"

Trust Mechanics

🔒

Escrow with time-locks

Funds released only on verified delivery

⚖️

Dispute resolution

Human or AI arbitration for edge cases

📉

Sliding refund scale

Partial credit for partial delivery

🆕
New Agent
Low trust, small bounties
📊
Track Record
Outcomes verified
Trusted Agent
High-value bounties
🏅
Elite Status
Premium rates, priority
08 / 12

Path: Managed → Open Marketplace

Like Uber: start premium, then open the platform.

🛠️
Phase 1: Now
We run the agents
🤝
Phase 2: Partners
Vetted agent builders
🌐
Phase 3: Open
Any agent can join

Phase 1: Managed (Now)

We operate all agents

Quality control, learn playbooks

$4K MRR validates demand

Customers paying for outcomes

Build trust infrastructure

Escrow, verification, reputation

Phase 2-3: Marketplace

🔜

Invite partner agents

Vetted builders, revenue share

🔜

Open to all agents

Anyone can compete for bounties

🔜

Platform take rate: 15-20%

Like Uber, Airbnb, marketplace standard

The Uber Playbook

Uber started with black cars (premium, managed) before opening to UberX (open marketplace). We start with our agents, prove economics, then open to all. Services fund the platform build.

09 / 12

Comparable Companies & Valuations

Services → Platform is a proven path to massive outcomes.

$13.8B
Scale AI
Data labeling services → platform
$1.67B
Upwork
Freelance marketplace (ripe for disruption)
$1.2B
Pilot
Bookkeeping: humans + AI
$50B+
Palantir
Services → Platform → Public

Scale AI: Our North Star

1️⃣

Started as services

Data labeling for ML companies

2️⃣

Built the platform

Tools, workflows, quality systems

3️⃣

$2B+ revenue (2025)

Services funded the infrastructure

4️⃣

$13.8B valuation

Platform economics, not services multiples

Why We're Bigger

📊

Scale AI: One vertical

Data labeling for ML

🌐

Us: All AI-deliverable work

Sales, content, research, ops...

📈

TAM: $1.5T+ services market

Every white-collar task that can be AI'd

10 / 12

Why Now: The Perfect Storm

GPT-5
Agents now capable
x402
Machine payments ready
a16z Big Ideas 2026
70%
AI SDR churn
Tools failing, outcomes wanted
$1B+
AI coding revenue (2025)
a16z: Agent apps thriving

Technology Inflection

🧠

Models capable enough

GPT-5, Claude 4 can do real work

💳

x402 machine payments

Agents can transact autonomously

🔧

Infrastructure exists

OpenClaw, MCP, agent frameworks

Market Readiness

💔

AI tools disappointing

70% churn = buyers want outcomes

💰

Budget exists

Companies spending on AI, getting nothing

🏃

First mover advantage

No AI-native outcome marketplace yet

Emerging primitives like x402 make payment settlement programmable and reactive. Smart contracts can settle a dollar payment globally in seconds. In 2026, this becomes the rails for agent commerce.

— a16z Big Ideas 2026, Part 3

11 / 12

Team & Traction

$4K
MRR
5
Customers
3-7x
Margin Multiple
0%
Churn (outcome-aligned)

Keith Schacht

Co-Founder + Advisor

$140M exit (Mystery) · a16z funded · Built consumer products used by millions

[CTO]

Co-Founder & CTO

CTO of Because · $3M Seed · Deep agent infrastructure experience

Yasir

Co-Founder

yapthis.com · Agentic architecture · Shipped production agent systems

What Traction Proves

Companies pay for outcomes. 0% churn because incentives align. This is the business model for AI work.

12 / 12

The Ask

What We Need

💰

$[X] Pre-Seed

Scale agent capacity, build marketplace infra

🎯

12-month goal: $1M ARR

Prove economics before opening marketplace

🌐

24-month: Open marketplace

Partner agents, then fully open

Why Us

🐕

Dog-fooding OpenClaw

We run agents daily, know what breaks

📊

Built the infrastructure

ClawView, guardrails, workflows

💵

Revenue already

$4K MRR proves the model

OpenHolly: The Uber for AI Work

Post an outcome. AI agents compete. Pay for results. The marketplace that makes AI actually deliver.

🔧 Infrastructure We're Building

🛡️ Guardrails📊 ClawView🏛️ AgentGov

Trust layer that makes marketplace outcomes reliable.

📚 Sources

a16z: "14 Big Ideas for 2026" (Macy Mills, Andrew Lee, Kenan Saleh) • "Big Ideas 2026 Part 1-3" • NFX: "The Network Effects Bible" (70% of tech value) • Market Data: Scale AI ($13.8B), Upwork ($1.67B), GitCoin ($60M+ distributed) • Replit: Bounties marketplace launch

1 / 12
CONTROL PLANE THESIS

The Control Plane for
AI Agents

Everyone's building autonomous agents. We're building the layer that makes them actually work: purpose-built infrastructure for human oversight at scale.

95%
AI pilots fail to deliver ROI
MIT Research, 2025
17x
Error amplification in "bag of agents"
DeepMind, Dec 2025
71%
Accuracy improvement with HITL
Microsoft Magentic-UI
$4.5K
MRR proving the thesis
OpenHolly, Feb 2026
2 / 12

The Inconvenient Truth: Autonomy Fails

The research is clear—and the industry is learning the hard way.

Multi-Agent Systems Break Down

"Multi-agent architectures, despite their promise, can fall short on efficiency, reliability, and even accuracy... performance often degrades as coordination complexity increases."

— Berkeley/DeepMind "Why Multi-Agent LLM Systems Fail", 2025

📊

75% failure rate

ChatDev on ProgramDev benchmark

📊

~50% average task completion

Across autonomous agent frameworks

📊

17x error amplification

In uncoordinated "bag of agents"

Enterprise AI Projects Crater

"42% of companies abandoned most of their AI initiatives in 2024, up from 17% the previous year. The average organization scrapped 46% of AI proof-of-concepts."

— S&P Global Research, 2024

📊

95% of AI pilots fail

MIT Research on enterprise deployments

📊

80%+ never reach production

RAND Corporation AI project study

📊

2x failure rate vs traditional IT

AI projects vs standard software

Why This Matters

The industry is betting billions on fully autonomous agents. The research says they don't work. Someone needs to build the layer that makes them work.

3 / 12

Microsoft's Answer: Human-in-the-Loop

The largest AI research org in the world just validated our thesis.

"We argue that human-in-the-loop agentic systems offer a promising path forward, combining human oversight and control with AI efficiency to unlock productivity from imperfect systems."

— Microsoft Research, Magentic-UI (July 2025)

Magentic-UI Results

71%
Accuracy improvement with human-in-loop
30.3% → 51.9% on GAIA benchmark
📊

Only 10% of tasks needed human help

Lightweight intervention, massive improvement

📊

1.1 avg clarifications per help request

Minimal interaction overhead

Key Interaction Mechanisms

🤝

Co-planning

Human + agent collaborate on plan before execution

🔄

Co-tasking

Seamless handoff between human and agent control

🛡️

Action guards

Human approval for high-stakes actions

🧠

Memory

Learn from past interactions to improve

Microsoft's Conclusion

"Even as tomorrow's agents become more capable and reliable, we believe that human involvement will remain essential for preserving human agency, resolving unforeseen ambiguities, and guiding agents in adapting to an ever-changing world."

4 / 12

Anthropic's Findings: The Oversight Paradox

Real-world data from millions of Claude Code sessions reveals how humans actually oversee agents.

As Users Gain Experience...

📈

Auto-approve increases: 20% → 40%+

Experienced users let Claude run autonomously

📈

BUT interrupt rate ALSO increases: 5% → 9%

They intervene more often, not less

💡

The shift: Step-by-step → Exception-based

From approving everything to watching for problems

Agent-Initiated Stops Matter

🤖

Claude asks for clarification 2x more

On complex tasks vs simple ones

🤖

More often than humans interrupt

On the most difficult tasks

💡

Models know when they're uncertain

They can (and should) ask for help

"Effective oversight doesn't require approving every action but being in a position to intervene when it matters... our central conclusion is that effective oversight of agents will require new forms of post-deployment monitoring infrastructure and new human-AI interaction paradigms."

— Anthropic Research, "Measuring AI Agent Autonomy in Practice" (Feb 2026)

The Deployment Overhang

Anthropic found that "the autonomy models are capable of handling exceeds what they exercise in practice." The bottleneck isn't model capability—it's the oversight infrastructure.

5 / 12

Air Traffic Control for AI Agents

The analogy everyone is converging on—and what it means for product design.

"Think of agents within your multi-agent system as the airplanes. The agents have their own autonomy to act. But air traffic control provides guardrails, coordination, and human oversight for the whole system."

— Jason Bryant, AI in Pharma (Jan 2026)

Why Air Traffic Control Works

✈️

Planes are autonomous

Pilots make real-time decisions

🗼

Controllers handle coordination

Routing, conflicts, emergencies

👤

Humans handle edge cases

Technology can't modify standard procedures

🔄

System improves over time

Incidents become new procedures

Why This Analogy Matters

📊

Scaling ratio: 1 controller : many planes

Not 1:1 human-to-agent

🛡️

Controllers can't replace pilots

Nor vice versa—complementary roles

⚠️

No full automation possible

Edge cases require human judgment

💰

Multi-billion dollar industry

ATC isn't going away

The Thesis

As AI agents proliferate, every company will need an "air traffic control" system for their agent fleet. That's the control plane we're building.

6 / 12

Why Current Interfaces Fail

Existing tools weren't designed for the human-agent oversight problem.

❌ Chat Interfaces

Conversational, not workflow-oriented. Can't manage 100 agents. No approval queues. No batch operations. You'd need a chat window per agent.

❌ Code/GitHub

Great for developers. Useless for ops teams. Can't approve actions in real-time. No visual understanding of agent state or intent.

❌ Slack/Email Alerts

Ad hoc approvals. No context. Alert fatigue. Doesn't learn from decisions. Can't see what agent plans to do next.

❌ Observability Dashboards

Read-only visibility. No intervention capability. See problems after they happen. Can't modify agent plans mid-execution.

"Only 14.4% of enterprises have full security approval for AI agents. 88% reported agent-related incidents. The interface problem is also a governance problem."

— Gravitee State of AI Agents Report, 2026

The Gap

There's no purpose-built interface for humans to oversee AI agents at scale. Not dashboards. Not chat. Not alerts. A new category needs to exist.

7 / 12

What a Control Plane Actually Needs

Distilled from Microsoft, Anthropic research, and our own deployments.

Pre-Execution

📋

Plan Review

See what agent intends to do before it acts. Edit plans. Add constraints.

🎯

Scope Boundaries

Define allowed domains, tools, actions. Agent can't exceed boundaries.

🔗

Workflow Templates

Start from proven patterns. Don't reinvent for every task.

During Execution

👁️

Real-Time Visibility

See agent actions as they happen. Browser view. Code execution. API calls.

⏸️

Interrupt & Resume

Pause any agent instantly. Take control. Hand back.

🛡️

Action Guards

Automatic pause for high-stakes actions. Configurable thresholds.

Approval Layer

📥

Unified Queue

All pending approvals across all agents in one view.

🎛️

Batch Operations

Approve/reject patterns across many agents at once.

🔀

Smart Routing

Route different decisions to different humans by expertise.

Learning Layer

🧠

Decision Memory

Human approvals become future patterns. Rejections become rules.

📈

Threshold Tuning

Auto-adjust when to ask humans based on outcomes.

📚

Playbook Evolution

Workflows improve with every human intervention.

8 / 12

The "Control Plane" Category

Every complex system has a control plane. AI agents need one too.

🐕
Datadog
$50B+ market cap

Control plane for infrastructure. See what's happening. Alert when things break. Intervene.

☸️
Kubernetes
Industry standard

Control plane for containers. Orchestrate workloads. Handle failures. Scale automatically.

🔐
Okta
$15B+ market cap

Control plane for identity. Who can access what. Audit trails. Compliance.

🎛️
???
AI Agent Control Plane

What agents are doing. Approvals & intervention. Learning & guardrails. This category doesn't exist yet.

"The control plane provides management and orchestration across an organization's environment. It's akin to air traffic control for applications."

— Vectra AI definition

The Opportunity

Infrastructure got Datadog. Containers got Kubernetes. Identity got Okta. AI agents need their control plane. We're building it.

9 / 12

Why Human-in-the-Loop Scales

The VC objection—and why it's wrong.

The Objection

"If humans are in the loop, doesn't that kill unit economics? Isn't the whole point to remove humans?"

The Response: Look at the Data

Scale AI
$13.8B valuation

Human labelers + AI. Humans as oversight.

Pilot
$1.2B valuation

Human bookkeepers + AI. Humans as QA.

Palantir
$50B+ market cap

Human analysts + AI. Humans as strategists.

The Key Distinction

"Humans as OVERSIGHT, not labor. AI does the work, humans QA. The ratio improves over time."

The Scaling Math

1️⃣

Year 1: 10:1 ratio

1 human oversees 10 agents. Heavy QA.

2️⃣

Year 2: 100:1 ratio

System learns. Fewer interventions needed.

3️⃣

Year 3+: 1000:1 ratio

Humans handle edge cases only. Still critical.

The Avi Medical Case Study

81% automation rate. 93% cost savings. Humans handle complex cases. HITL doesn't kill unit economics—it enables them.

10 / 12

The Contrarian Bet

Everyone's zigging toward full autonomy. We're zagging toward control.

What Everyone Else is Building

🤖

Fully autonomous agents

Demo well. Break in production.

🤖

More agent capabilities

Better models. More tools. Same failure modes.

🤖

"Just add more agents"

17x error amplification, per DeepMind.

🤖

Removing humans entirely

The dream that keeps failing.

What We're Building

🎛️

The oversight layer

Makes ANY agent more reliable.

🎛️

Human-agent collaboration

Complementary strengths. Better outcomes.

🎛️

Coordination infrastructure

Turns bag-of-agents into functional team.

🎛️

Humans in the right places

Exception handling. Strategic oversight.

"I'm especially excited about products that use AI to make previously expensive services cheaper and more accessible, sometimes using human-in-the-loop to start."

— Keenan Saleh, a16z Speedrun Partner

Our Position

We're not betting against agent capabilities improving. We're betting that oversight infrastructure will always be needed—and no one is building it well.

11 / 12

Why Us, Why Now

Team

Keith Schacht

Co-Founder + Advisor

$140M exit (Mystery) · a16z funded · Built consumer products used by millions

[CTO]

Co-Founder & CTO

CTO of Because · $3M Seed · Deep agent infrastructure experience

Yasir

Co-Founder

yapthis.com · Agentic architecture · Shipped production agent systems

Why Now

📈

Agent adoption is exploding

OpenAI Operator, Anthropic Claude Code, 1000+ agent startups

💔

Failure rates are becoming visible

95% pilot failure is now common knowledge

📄

Research is converging

Microsoft, Anthropic, DeepMind all pointing to HITL

🏛️

Regulation is coming

EU AI Act mandates audit trails & oversight

What We've Built

$4.5K MRR

Proving the thesis with real customers

OpenClaw infrastructure

Dogfooding our own control plane daily

Guardrails, ClawView, AgentGov

Components of the full control plane

12 / 12

The Ask

The Human-Agent Control Plane

Purpose-built infrastructure for human oversight of AI agents at scale. Plan review. Action guards. Approval queues. Learning loops. The missing layer that makes agents actually work.

What We Need

💰

$[X] Pre-Seed

Build the full control plane product

🎯

12-month goal: $1M ARR

Prove control plane scales across customers

📚

Then: Category definition

Be "Datadog for AI agents"

The Opportunity

📈

New category creation

No one owns "AI agent control plane" yet

📈

Research-backed thesis

Microsoft, Anthropic, DeepMind alignment

📈

Every agent deployment needs this

Horizontal opportunity across industries

🔧 Infrastructure We're Building

🛡️ Guardrails📊 ClawView🏛️ AgentGov🤖 Employee OS

The Control Plane integrates all infrastructure layers into one human-facing interface.

🔬 Research Foundation

MIT: 95% of AI pilots fail · DeepMind: 17x error amplification in multi-agent · Microsoft Magentic-UI: 71% accuracy improvement with HITL · Anthropic: "New oversight infrastructure needed" · Berkeley: "Why Do Multi-Agent Systems Fail?" · S&P Global: 42% of AI initiatives abandoned

1 / 12
VIBE CODING OUTCOMES

Vibe Code Your Business

"Vibe coding" revolutionized app development—describe what you want, AI builds it. Now apply this to business outcomes. Describe the result, AI + humans deliver it.

Feb 2025
Karpathy coins "vibe coding"
X/Twitter
2026
"Vibe productivity" emerges
Beyond just coding
71%
Accuracy boost with HITL
Microsoft Magentic-UI
$4K
MRR proving the thesis
2 / 12

The Vibe Coding Revolution

What started as a meme became a paradigm shift. Now it's evolving beyond code.

"There's a new kind of coding I call 'vibe coding', where you fully give in to the vibes, embrace exponentials, and forget that the code even exists."

— Andrej Karpathy, Feb 2025 (coined the term)

Origins & Evolution

2023

"The hottest new programming language is English"

Karpathy's early prediction about LLM capabilities

2025

Vibe coding goes mainstream

Cursor, Replit, Claude Code—describe → build

2026

Beyond coding: "Vibe Productivity"

Research, writing, reporting, file operations, "glue work"

Where It's Going

"What changed in early 2026 is that vibe coding is no longer confined to software development; it is spreading into research, writing, reporting, spreadsheet wrangling, file operations, and 'glue work' that usually fragments attention."

— Ken Huang, "The Vibe Shift" (Jan 2026)

The Pattern

Vibe coding showed that natural language → complex software works. Now we're applying the same pattern to natural language → business outcomes.

3 / 12

From Apps to Outcomes

The next evolution: describe what you want to achieve, not what you want built.

💻
Vibe Coding
"Build me an app that..."
Vibe Outcomes
"Get me 50 sales meetings"
🎯
Result
Meetings on your calendar

❌ Current Reality: Use Tools

1

Subscribe to AI SDR tool

$5-10K/month

2

Configure the tool

Import lists, write sequences, set rules

3

Monitor the tool

Fix errors, adjust settings, babysit

4

Hope for outcomes

70% churn in 3 months when it doesn't work

✓ Vibe Outcomes: Describe Results

1

Describe what you want

"50 qualified sales meetings with Series A fintech founders"

2

AI agents execute

Research, outreach, qualification, scheduling

3

Humans QA

Review, approve, handle edge cases

4

Pay for outcomes

$X per meeting delivered

The Thesis

Vibe coding proved that intent → artifact works for software. Vibe outcomes proves it works for business results. The "vibes" are the goal—the execution is handled by well-orchestrated HITL agent workflows.

4 / 12

How It Works

Describe outcome → Agents execute → Humans QA → Outcome delivered

💬
Natural Language
"I need..."
📋
Workflow Generation
Map to playbook
🤖
Agent Execution
Multi-agent work
👤
Human QA
Review & approve
Outcome
Delivered

Example: "50 Sales Meetings"

1

Input

"Book 50 qualified meetings with Series A fintech founders in Q1"

2

Research Agent

Identifies prospects, signals, contact info

3

Outreach Agent

Drafts personalized messages

4

Human Review

Approves messaging before send

5

Scheduling Agent

Books the meeting when prospect replies

Example: "Process These Invoices"

1

Input

"Process this month's invoices and flag anomalies"

2

Extraction Agent

Pulls data from PDFs, emails, systems

3

Matching Agent

Matches to POs, identifies discrepancies

4

Human Review

Approves exceptions, flags fraud

5

Output

Processed invoices, exception report

5 / 12

Why Vibe Outcomes Need Human-in-the-Loop

Pure AI can't deliver reliable business outcomes. The research is clear.

95%
AI pilots fail to deliver ROI
MIT NANDA Study
30%
Lower success when agents collaborate
CooperBench, 2026
38.9%
Cite accuracy as #1 AI challenge
Industry analysts, 2025
71%
Accuracy improvement with HITL
Microsoft Magentic-UI

Why Pure AI Fails

"Multi-agent architectures, despite their promise, can fall short on efficiency, reliability, and even accuracy... performance often degrades as coordination complexity increases."

— Berkeley/DeepMind, 2025

⚠️

Hallucinations occur even with high confidence

AI can be confidently wrong about business-critical decisions

⚠️

Edge cases are infinite

Business has nuance AI can't anticipate

⚠️

Stakes are high

Brand damage, legal liability, lost deals

Why HITL Fixes It

"Hybrid AI workflows, which combine automation with human oversight, are not a fallback; they're the modern standard for reliability, trust, and scalability in 2026."

— Parseur, Dec 2025

Human as QA layer, not labor

AI does 90% of work, humans verify critical decisions

Trust calibration over time

System learns when to ask, when to proceed

Only 10% of tasks need human help

Microsoft found lightweight intervention = massive improvement

6 / 12

The Interaction Layer

This is the UX for the AI-native agency, control plane, and marketplace pitches.

Why Current Interfaces Fail

❌ Chat Interfaces

Conversational, not outcome-oriented. Can't manage complex multi-step workflows. No approval queues.

❌ Dashboards

Read-only visibility. No intervention. See problems after they happen. Can't modify plans mid-execution.

❌ Slack/Email Alerts

Ad hoc. No context. Alert fatigue. Can't see what agent plans to do next.

The Vibe Outcomes Interface

💬

Natural language input

"I need X" → system figures out how

📋

Progress visibility

See what's happening toward your goal

🎛️

Approval queues

Review decisions that matter

⏸️

Interrupt & adjust

Course-correct mid-execution

📊

Outcome tracking

Clear metrics: delivered vs requested

🔗 This Powers Our Other Pitches

⚡ Fat Startup: Vibe outcomes is how customers interact with us
🚗 Uber for AI Work: Natural language bounty posting
🎛️ Control Plane: The human oversight layer
☁️ AWS of AI Work: Workflow templates activated by intent

7 / 12

Market Opportunity

The shift from "tools" to "outcomes" is creating massive new markets.

$52.6B
AI Agents Market by 2030
MarketsandMarkets
30%+
Enterprise SaaS with outcome-based pricing
Gartner 2025 Projection
61%
CFOs changing how they evaluate AI ROI
Industry Survey, 2025
$1.5T
Global professional services (TAM)
Work that can be "vibe coded"

Who Wants This

🏢

SMBs frustrated with AI tools

70% AI SDR churn = customers seeking alternatives

🏢

Enterprises with AI fatigue

95% pilot failure = demand for what works

🏢

Founders too busy to manage AI

Want outcomes, not another tool to learn

The Pricing Shift

"Per-seat is no longer the atomic unit of software. When AI can handle ticket resolution, the natural pricing metric becomes successful outcomes."

— a16z Enterprise Newsletter, Dec 2024

💰

Outcome-aligned pricing

$X per meeting, $Y per processed invoice, $Z per video

8 / 12

Competitive Landscape

Who else is thinking about natural language → outcomes?

Tools (Not Outcomes)

AI SDRs (11x, Artisan, AiSDR)

Sell tools. Charge per seat. You manage agents. 70% churn.

❌ Not outcome-based

Agent Platforms (LangChain, CrewAI)

Infrastructure for developers. Build your own workflows.

❌ Not outcomes, just primitives

Automation (Zapier, Make)

Workflow automation. You design the flows.

❌ Not AI-native, not outcome-based

Closest Parallels

Scale AI ($13.8B)

Services + HITL → platform. "We need labeled data" → delivered.

✓ Outcome-based, HITL model

Pilot ($1.2B)

"Do my bookkeeping" → done. Humans + AI.

✓ Outcome-based, HITL model

Intercom Fin ($0.99/resolution)

AI support priced per successful outcome.

✓ Outcome-based pricing model

Our Differentiation

Horizontal, not vertical. Scale AI = data labeling. Pilot = bookkeeping. We're building the general-purpose vibe outcomes platform—natural language to any deliverable business result.

9 / 12

Current Traction

Proving the thesis with real customers and real outcomes.

$4K
MRR
5
Customers
0%
Churn
3
Outcome Types

Outcomes We've Delivered

📅

"Get me sales meetings"

SDR/BDR for construction, startups (50% of revenue)

🎬

"Generate training videos"

ML training data pipelines (30% of revenue)

📚

"Research these topics"

University lab literature synthesis (20% of revenue)

Why Zero Churn

"When you only pay for outcomes, there's no reason to churn. We deliver meetings, they pay. We don't deliver, they don't pay. Aligned incentives = sticky customers."

vs. AI Tool Churn

AI SDRs charge $5-10K/mo whether or not they work. When they don't deliver, customers leave. Misaligned incentives = 70% churn.

10 / 12

Why Now: 2026 Is the Year

Technology, market, and cultural convergence make this the moment.

Technology Ready

🧠

Models finally capable enough

GPT-5, Claude 4 can execute real business workflows

🔧

Agent infrastructure exists

OpenClaw, MCP, tool-use protocols

💳

x402 machine payments

Agents can transact autonomously (a16z Big Ideas 2026)

📊

HITL research converging

Microsoft, Anthropic, DeepMind all pointing same direction

Market Ready

💔

AI tool fatigue

70% AI SDR churn. 95% pilot failure. Customers want what works.

💰

Budget exists

Companies spending billions on AI, getting nothing

📈

Pricing shift happening

30%+ enterprise SaaS moving to outcome-based

🎯

"Vibe coding" cultural moment

Natural language → results is now understood

"2025 was widely labeled 'the year of AI agents.' In reality, it was the year we learned what agents can and cannot do. 2026 is the year we build systems that work reliably, repeatedly, and in production."

— Human-in-the-Loop Newsletter, Dec 2025

11 / 12

Team

Keith Schacht

Co-Founder + Advisor

$140M exit (Mystery) · a16z funded · Built consumer products used by millions

[CTO]

Co-Founder & CTO

CTO of Because · $3M Seed · Deep agent infrastructure experience

Yasir

Co-Founder

yapthis.com · Agentic architecture · Shipped production agent systems

What We've Built

🐕

Dog-fooding daily

Running OpenClaw infrastructure ourselves

🛡️

Agent Seatbelt

Browser-layer guardrails

📊

ClawView

Agent observability

📚

Workflow templates

Playbooks that compound

Why Us

We've shipped outcomes

$4K MRR from real deliverables

We understand HITL

Built the infrastructure, not just the agents

We know the failure modes

Encoded in playbooks from real experience

12 / 12

The Ask

Vibe Code Your Business

Describe the outcome you want. AI agents + human QA deliver it. Pay only for results. The interaction layer for the AI economy.

What We Need

💰

$[X] Pre-Seed

Scale agent capacity + build the interface

🎯

12-month goal: $1M ARR

Prove vibe outcomes across multiple verticals

📦

Then: Self-serve platform

Anyone can describe outcomes and get them

The Opportunity

📈

New category creation

"Vibe outcomes" platform doesn't exist yet

📈

Cultural moment

Vibe coding is mainstream—extend it to business

📈

$52.6B market by 2030

AI agents + outcome-based pricing converging

📚 Research Foundation

Karpathy: Coined "vibe coding" Feb 2025 · RAND: 80% AI project failure · Microsoft Magentic-UI: 71% accuracy improvement with HITL · CooperBench: 30% lower success in multi-agent without coordination · a16z: Outcome-based pricing shift · Gartner: 30%+ enterprise SaaS with outcome pricing by 2025 · Bessemer: AI Pricing Playbook (Feb 2026)

🔗 Related Pitches

⚡ Fat Startup💰 Outcome-Based🚗 Uber for AI Work🎛️ Control Plane

Vibe Coding Outcomes is the UX/interaction layer that powers all of these.

Research • NYC Target Companies
01

25 NYC Startups: R&D Opportunities

Series A-B companies ($13M-$160M raised) with specific research they could implement but haven't.

25

NYC Tech Startups

$850M+

Combined Funding

75+

Research Opportunities

02

🏦 Fintech / Finance AI

Rogo — $75M (Series B)

Building "Wall Street's first AI analyst" — LLMs for financial reasoning

R&D Opportunities:

Hook: "Your financial reasoning models could be 40% more accurate on tabular data with Chain-of-Table"

Farsight — $16M (Series A)

AI for finance — valuation models, deal analysis, Excel/PPT generation

R&D Opportunities:

  • SpreadsheetLLM — Microsoft's approach to better spreadsheet understanding
  • DocPrompting — Generate accurate documents with citations
  • Table-GPT — Unified table understanding and generation

Hook: "SpreadsheetLLM could cut your Excel generation errors by 30%"

Aiera — $25M (Series B)

GenAI for financial professionals — broker research, earnings calls, filings

R&D Opportunities:

  • LongLoRA — Process 10x longer earnings calls without quality loss
  • RAG-Fusion — Multiple query generation for better retrieval
  • Time-LLM — Repurpose LLMs for time series forecasting

Hook: "LongLoRA could let you process 10x longer earnings calls without quality loss"

Carbon Arc — $56M (Series A)

Marketplace for curated AI-ready datasets (Insights Exchange)

R&D Opportunities:

Hook: "DataComp benchmarking could become your quality certification"

03

🏥 HealthTech / BioTech

Ataraxis — $20M (Series A)

AI for cancer precision medicine — analyzes data to identify optimal treatments

R&D Opportunities:

  • CancerGPT — Few-shot learning for drug pair synergy prediction
  • DrugCLIP — Contrastive learning for drug-target interaction
  • Med-PaLM 2 — Google's medical LLM achieving expert-level performance

Hook: "CancerGPT's few-shot approach could expand your drug combination predictions 5x faster"

Inspiren — $35M (Series A)

AI + IoT for senior care — AUGi device for fall detection and patient monitoring

R&D Opportunities:

Hook: "RT-DETR could cut your fall detection latency by 40% while running entirely on-device"

Slingshot AI — $40M (Series A)

AI for mental health — "Ash" chatbot simulates therapist-like conversations

R&D Opportunities:

Hook: "Constitutional AI could reduce harmful responses by 80% while maintaining therapeutic value"

Camber — $30M (Series B)

Healthcare payment automation — streamlines insurance reimbursement

R&D Opportunities:

Hook: "Medical coding LLMs could auto-fill 60% of your claims forms"

04

🛠️ Dev Tools / Infrastructure

Warp — $18M (Series A)

AI-powered payroll platform for multi-state compliance

R&D Opportunities:

NetBox Labs — $35M (Series B)

Open-source network automation platform

R&D Opportunities:

Topline Pro — $27M (Series B)

AI marketing for home service businesses

R&D Opportunities:

05

💼 Sales / Marketing AI

Clay — $40M (Series B, $1.25B valuation)

AI for sales personalization — integrates 100+ data sources

R&D Opportunities:

Hook: "Buyer intent prediction could 3x your users' reply rates"

Profound — $35M (Series B) ⭐ Existing Client

AI search optimization — helps brands appear in AI-generated responses

R&D Opportunities:

ShopMy — $77.5M (Series B)

Influencer commerce platform

R&D Opportunities:

06

⚖️ Compliance / Legal AI

Norm AI — $48M (Series B)

AI for regulatory compliance — automates review of legal documents

R&D Opportunities:

Hebbia — $130M (Series B, $700M valuation)

Document AI — searches large document sets with citations

R&D Opportunities:

Hook: "Self-RAG could improve your citation accuracy by 25%"

07

🔒 Cybersecurity / 🌱 Climate / 🛒 Consumer

Zip Security — $13.5M

SMB cybersecurity

  • LLM threat intelligence
  • Automated SOC analyst
  • LLM phishing detection (+40% accuracy)

Chestnut Carbon — $160M

Reforestation + carbon credits

  • Satellite carbon estimation
  • Biodiversity monitoring (audio/visual)
  • ML credit verification

GDI — $20M+

Silicon anodes for EV batteries

  • Battery degradation prediction
  • Materials discovery with ML
  • CV defect detection (-40% QC cost)

Novig — $18M

P2P sports betting

  • LLM odds modeling
  • Market making algorithms
  • Fraud detection

David — $75M

High-protein nutrition bars

  • AI food formulation
  • Demand forecasting
  • Consumer preference modeling

Cents — $40M

Laundry/dry-cleaning SaaS

  • Demand forecasting
  • Route optimization
  • Image garment classification
08

🎯 Best Targets by Category

🔥 Highest Urgency (AI-Native)

  • Rogo — Financial reasoning is hard, need every edge
  • Hebbia — Document AI is competitive, Self-RAG matters
  • Aiera — Long context + time series = big opportunities
  • Slingshot AI — Safety is existential for mental health AI

💰 Big Companies With Resources

  • Clay ($1.25B val) — Can afford to experiment
  • Hebbia ($700M val) — Research-forward culture
  • Chestnut Carbon ($160M) — ML for verification is huge

🎯 Underserved Markets

  • Inspiren — Elder care + CV is niche
  • Cents — Laundry tech has zero AI competition
  • Topline Pro — Home services AI is wide open

⭐ Existing Relationship

  • Profound — Already a client, easy expansion

Outreach Template

Subject: Quick R&D idea for [Company] — [specific technique]

Hi [Name],

Congrats on [recent news/funding]. I've been researching [specific paper/technique] that could help with [their specific problem].

Quick version: [1-sentence benefit with number]

I put together a 2-page brief showing how this could work for [Company]. Want me to send it over?

Research • Positioning Analysis
01

R&D ≠ The Pain Point

The real market pain is downstream from R&D — it's about shipping AI to production.

80%

AI projects fail to reach production (RAND)

95%

GenAI pilots failing (MIT/Fortune 2025)

The gap isn't finding the right model. It's shipping AI to production.

02

The Skills Gap (Reddit Gold)

From r/MLQuestions — 688 upvotes, Nov 2025

What Candidates Know

  • Transformer architectures, attention mechanisms
  • Papers they've implemented (diffusion, GANs, LLMs)
  • Kaggle competitions, theoretical deep learning

What Companies Need

  • Deploy a model behind an API that doesn't fall over
  • Write data pipelines that process reliably
  • Debug why the model is slow/expensive in production
  • Build evals to know if the model is working

"I'll interview someone who can explain LoRA fine-tuning in detail but has never deployed anything beyond a Jupyter notebook."

— Startup co-founder hiring ML engineers

03

The Observability Gap (Your Opportunity)

From Cleanlab's survey of 95 teams with AI in production

<1/3

Teams satisfied with observability

63%

Plan to improve observability next year

70%

Rebuild AI stack every 3 months

Key Insight

Even among the 5% of companies that reach production, most remain early in maturity. They can't reliably know when their agents are right, wrong, or uncertain.

04

Reframing The Pitch

❌ OLD: "AI R&D Engineer" ✅ NEW: "Production AI Engineer"
Vibes Research, experimentation Deployment, reliability
Perception Nice-to-have Need-to-have
Target Teams with resources Teams with stuck projects
Job-to-be-done "Find the best model" "Ship to production this month"

The Positioning Gap

Aemon = the optimization engine

You = the shipping engine

05

Target Customers (Not Research Teams)

🚀 Series A-C Startups with AI Features

  • Have small ML teams, can't hire fast enough
  • ML engineers cost $200-400k and are hard to find
  • Need someone who can actually deploy, not just research

Pain: "We have 3 AI features in Jira blocked for months"

🏢 Product Companies Adding AI

  • Non-ML companies adding AI features
  • Don't have ML expertise internally

Pain: "We want AI in our product but don't know where to start"

⚙️ Enterprise AI Platform Teams

  • Drowning in stack churn (rebuilding every 3 months)
  • Coordination overhead killing velocity

Pain: "Platform team of 5 supporting 20 feature teams — we're bottlenecked"

🏛️ Regulated Industries

  • 42% plan to add oversight features (vs 16% unregulated)
  • Need governance + observability

Pain: "Can't deploy AI without compliance sign-off"

06

Better Pitch Angles

1. "Your AI Projects Are Stuck. We Ship Them."

  • Target: Companies with AI projects "in progress" for months
  • Proof: Show deployment timelines (weeks vs months)
  • Wedge: Audit → identify stuck projects → ship one fast

2. "AI Observability + Ops as a Service"

  • Target: Companies with AI in production but no visibility
  • Pain: "We don't know when our AI is wrong"
  • Proof: Catch regressions, reduce incidents

3. "The AI Platform Team You Can't Hire"

  • Target: Scaling startups without MLOps expertise
  • Pain: ML engineers cost $400k and don't want to do ops
  • Proof: Infrastructure setup in days, not months

4. "CI/CD for AI" (existing pitch)

  • Still good, but position as production not research
  • Focus on deployment gates, not model selection
  • "Every AI PR tested against your evals before merge"

Action Items

  • Rewrite pitches with "production" and "ship" language
  • Target stuck projects — companies with AI features in backlog
  • Lead with observability — 63% want better visibility
  • Offer quick wins — "Ship one AI feature in 2 weeks"
  • Avoid research teams — they don't have budget urgency
Research • AgentDocs Wedges
01

AgentDocs Wedges
& Approaches

Based on Garry Tan's YC video insight: agents pick tools based on doc quality, not actual performance. The Whisper/Groq problem.

Claude Code defaulted to Whisper V1 — a near-deprecated model — because it has better documentation than Groq, even though Groq is 200x faster and 10x cheaper.

— Garry Tan, YC Partner, Feb 2026

The Insight

Agents pick tools based on doc quality, not actual performance — and that's exactly the gap AgentDocs exploits.

02

Wedge Scoring (6 Dimensions)

Wedge Mkt Pain Comp Fit x402 Time Total
🥇 LLM / Model Routing 553545 27/30
🥈 Video Gen 553554 27/30
🥉 Audio / Transcription 352445 23/30
Deployment / Hosting 544434 24/30
Agent Identity (email/phone) 454335 24/30
Databases 543324 21/30
Image Gen 435343 22/30
03

x402 Market Reality Check

What agents are ACTUALLY spending on today (x402scan.com, Feb 2026)

$101K
24h Volume
513
Active Merchants
692
Crypto/Onchain Servers
0
Transcription Services

What Exists (Validated)

Crypto/Onchain

692 servers — dominant vertical

AI Servers

486 servers — led by Virtuals ACP ($163K/day)

Search/Data APIs

216 servers — StableEnrich, httpay

Trading Intelligence

203 servers — alpha signals

What's Missing (Opportunity)

0

Transcription

Zero servers — Garry Tan example!

~1

Video/Image Gen

42 txns — essentially nothing

0

Deployment/Hosting

Nothing

0

Databases

Nothing

04

Re-scored: x402 Demand vs Fit

Wedge x402 Now Holly Fit Verdict
Multi-API aggregation + capability layer ✓ 3 players, no AgentDocs ✓ Direct fit Best immediate wedge
Agent-to-agent coordination ✓ $163K/day (Virtuals) ✓ Holly as orchestrator Most validated demand
Social data for agents ✓ StableSocial live ✓ Fits Wurk agents Niche but real
Transcription (Whisper/Groq) ❌ Zero on x402 ✓ Strong routing layer 6–12 months early
Video gen ❌ Near zero ✓ Strong dogfood 12–18 months early

Key Insight

Absence of transcription/video/deployment on x402scan is opportunity signal, not rejection. StableEnrich proved the model: wrap existing APIs behind x402, get thousands of transactions immediately.

05

Recommended Launch Order

🎙️ 1. Transcription — NOW

Zero servers on x402. Garry Tan moment 6 days ago. First-mover window open.

AgentDocs value: Verified schema {input, model: "groq|deepgram|whisper", output}

Groq at $0.02/min → charge $0.03/min

"Your agent would have chosen Whisper V1. Ours chose Groq."

🎬 2. Video Gen — Dogfood Now

Parameter chaos problem (Kling uses cfg_guidance, Runway uses guidance_scale). Genuinely unsolved.

AgentDocs value: Agent sends {prompt, style, duration, budget}, Holly resolves params

Already dogfood — Holly generates video

🧠 3. LLM Routing — Big Vision

Agent says {task: "transcribe", latency: "fast"} → gets best provider with pricing + ready API call.

The purest AgentDocs wedge

📧 4. Agent Identity — HOT

Garry Tan: "Has anybody built Twilio for agents yet?"

Email + phone + wallet in one API call

Jared Friedman (YC): "Even the best dev tools don't let you sign up via API. This is a big miss in the claude code age — claude can't sign up on its own."

06

The Gap: Verified Snippet Services

3
Multi-API aggregators live
StableEnrich, httpay, LowPaymentFee
4.7K
Txns for StableEnrich
0
With capability contracts
0
With AgentDocs semantics

What They Do

Aggregate APIs (Apollo + Firecrawl + Grok + Serper) behind one x402 endpoint.

"Throw money at endpoint, get data back"

What They Don't Do

Structured capability contracts

Machine-readable reasoning

Verification + benchmarks

The Unified Pitch

OpenHolly becomes the first x402-native capability registry for non-crypto agent needs — the "Stack Overflow for agents" that makes every new category agent-accessible from day one.

07

How Agents Discover Agent-First Platforms

For agents, "discovery" = machine-interpretable services, not human landing pages.

1. Protocol-level Discovery (x402 v2)

Services expose structured metadata (endpoints, pricing, chains). Facilitators crawl and index.

2. Facilitator/Registry Indexing

Layer of facilitators that index x402 services, maintain up-to-date pricing/metadata.

3. Agent-Centric Wallets

Coinbase agentic wallets pre-integrated with x402. Discovery APIs built-in.

4. Semantic Capability Registries

"Internet of Agents" research: agents announce capabilities in machine-interpretable form.

The agent doesn't "Google" a platform; it queries its facilitator ("find a market-data API with latency <100ms and price <0.5¢/request"), receives candidates with structured metadata, picks one, then talks HTTP+402 with it.

— Perplexity Research, Feb 2026

01
WORLD MODELS

Synthetic Data for
Controllable Video Models

Video models can generate stunning visuals but can't follow precise instructions. The bottleneck isn't compute or architecture — it's training data with exact state trajectories.

Data scaling plateaus at 200K-400K samples. The persistent ~15% gap between in-domain and out-of-domain performance isn't solvable with more data — it requires architectural changes AND structured training data.

— VBVR Paper (Wang et al.), Feb 2026

$5B+
Raised in world model space (2025-26)
20M hrs
NVIDIA Cosmos training data
50%
Model accuracy on physics (chance level)
97%
Human accuracy on same tasks
02

The Core Insight: Controllability Before Reasoning

Why Video Models Fail at Reasoning

1

Trained on natural video

Learned "everything moves together" — can't isolate changes

2

No state tracking

Can't represent "object A moved, B stayed" explicitly

3

Errors compound

Step 1 error → Step 2 error → reasoning chain breaks

What Data Factory Provides

Exact state trajectories

Frame-by-frame ground truth of what changed

Parameterized variations

Same action, different contexts — curriculum learning

Physics-accurate simulation

Genesis/Isaac Sim backends for real dynamics

The Robot Arm Analogy

You can't teach a robot to cook if it knocks over the salt every time it reaches for the pepper. Same with video models: if they can't execute precise state transitions, chaining multi-step reasoning becomes impossible. Controllability is the prerequisite.

03

The "Data Factory" Architecture

📋
Domain Spec
"warehouse picking"
⚙️
Scene Generator
Parameterized templates
🎮
Physics Sim
Genesis / Isaac Sim
🎬
Video + Labels
State trajectories
📦
Training Dataset
LoRA-ready format

🏭 Vertical Templates

Pre-built scene generators for:

  • Warehouse robotics
  • Surgical verification
  • Manufacturing QA
  • Autonomous driving

⚡ Scale Economics

Genesis claims:

430,000x

faster than real-time simulation

🎯 LoRA-Ready Output

Direct fine-tuning:

  • Wan2.1 / Wan2.2 compatible
  • Rank 32 = startup compute
  • ~$5K for domain model
04

Why This Wins

📊 The Moat

The "data factory" — parameterized generators + distributed workers — is the real competitive advantage. No productized version exists for vertical industries.

Network effect: Each vertical adds templates → attracts more customers → funds more verticals

💰 Business Model

  • Per-video pricing: $0.01-0.10 per synthetic clip
  • Dataset packages: $5K-50K per domain
  • Enterprise: Dedicated capacity + custom templates

Gross margins >80% (compute is cheap vs. real data collection)

Timing

Genesis open-sourced Dec 2024. NVIDIA Cosmos launched Jan 2025. π0 open-sourced Feb 2025. The infrastructure just became available — but nobody has built the vertical data factory layer yet.

05

Competitive Landscape

Company Focus Gap
NVIDIA Cosmos Foundation models Not vertical-specific data
Genesis AI Physics engine No data pipeline layer
Physical Intelligence Robot foundation model Consumes data, doesn't sell it
Scale AI Data labeling Labels real data, doesn't generate
Data Factory (Us) Synthetic video data Full vertical pipeline ✓

The dirty secret of robotics AI is that real-world data collection costs $100-1000/hour when you include robot time, human supervision, and failure recovery. Synthetic data at $0.01/clip changes the economics completely.

— Industry estimate

01
WORLD MODELS

Benchmarking Infrastructure
for World Models

VLM-as-a-judge is expensive and non-reproducible. IntPhys shows models at chance level. Everyone's flying blind on what their world models actually understand.

Most models perform at chance levels (50%), in stark contrast to human performance, which achieves near-perfect accuracy (97%+). Current video understanding benchmarks do not capture intuitive physics.

— IntPhys 2 (Meta FAIR), Jun 2025

50%
Best model accuracy (chance level)
97%
Human accuracy (same test)
47 pts
Gap to close
$5B+
Raised without rigorous eval
02

The Evaluation Crisis

Current State: Flying Blind

1

VLM-as-a-judge

Expensive ($0.10-1.00/sample), non-reproducible, biased

2

Demo-driven claims

Cherry-picked videos, no systematic testing

3

Benchmarks don't transfer

Academic benchmarks ≠ production reliability

What's Needed

Deterministic scoring

Rule-based, reproducible, cheap to run

Human-correlated metrics

VBVR-Bench achieves ρ > 0.9 with human judgment

Domain-specific suites

Robotics, driving, medical — each needs own benchmarks

The VBVR Breakthrough

VBVR-Bench demonstrates that rule-based evaluation can match human judgment (ρ > 0.9 correlation). But it's research code, not a product. Domain-specific versions don't exist.

03

Product: Eval-as-a-Service

🎬
Upload Video
Model output
🔬
Benchmark Suite
Physics / Control / Reasoning
📊
Detailed Report
Scores + failure modes
📈
Leaderboard
Public or private

🧪 Benchmark Suites

  • Physics: Object permanence, gravity, collisions
  • Control: Instruction following, state isolation
  • Reasoning: Multi-step causal chains
  • Domain: Robotics, driving, medical

💰 Pricing

  • API: $0.01/video (basic)
  • Full suite: $0.10/video
  • Enterprise: Unlimited + custom
  • Leaderboard: Free tier for visibility

🎯 Target Customers

  • Runway, World Labs, DeepMind
  • Physical Intelligence, Wayve
  • Robot startups building on π0
  • Enterprise adopting video AI
04

Why This Works

📊 Market Dynamics

$5B+ has been raised in world models with no standardized evaluation. Every company is building their own benchmarks internally. That's waste.

Comparable: ML evaluation market ~$500M (2024), growing 25%+ YoY

🔄 Network Effects

  • Leaderboard: Models compete → drives adoption
  • Benchmark contributions: Companies add domain tests
  • Data flywheel: More evals → better calibration

The gap between demo videos and production reliability is massive. Objects disappear, physics drifts, game logic is brittle over longer sessions. We need systematic evaluation, not cherry-picked demos.

— GradientFlow Analysis, 2026

05

Roadmap

Q2

Launch Core Benchmarks

IntPhys-style physics, VBVR-style controllability, basic API

Q3

Domain Expansion

Robotics suite (π0 compatible), driving suite (Wayve/Comma style)

Q4

Public Leaderboard

Like HuggingFace but for world models. Attract submissions, build community

2027

Enterprise + Certification

"World Model certified for X domain" — becomes industry standard

01
WORLD MODELS

Training Platform for
Robot Foundation Models

Training video/world models costs 10-100x more than LLMs. The infrastructure layer is missing. We build the "AWS for embodied AI."

Embodied AI training requires tight integration of simulation, rendering, and ML. Current cloud offerings are designed for LLMs. The infrastructure gap is massive.

— Industry observation

$1B
World Labs raise (Feb 2026)
$600M
Physical Intelligence raise
$315M
Runway raise (Feb 2026)
10-100x
Video vs LLM compute cost
02

The Infrastructure Gap

What LLM Infra Provides

GPU clusters

H100s, A100s, optimized networking

Training frameworks

PyTorch, JAX, distributed training

Data pipelines

Text ingestion, tokenization, streaming

What Embodied AI Needs (Missing)

Integrated simulation

Physics engine + renderer + ML in one loop

Video data pipelines

Frame extraction, state annotation, streaming

Sim-to-real transfer

Domain randomization, reality gap tools

Why This Matters Now

π0 just open-sourced. Genesis just launched. Cosmos is available. The building blocks exist but nobody has assembled them into a platform. Every robotics startup is duct-taping their own stack.

03

Platform Architecture

🎮 Simulation Layer

Managed Genesis/Isaac Sim instances

  • One-click deployment
  • Auto-scaling workers
  • Pre-built environments
  • 430,000x faster than real-time

📊 Data Layer

Video + state trajectory storage

  • Frame-level annotations
  • Streaming to training
  • Version control for datasets
  • Curriculum management

🧠 Training Layer

Optimized for video models

  • Pre-configured for π0, Cosmos
  • LoRA fine-tuning pipelines
  • Distributed video training
  • Eval integration built-in
🎯
Define Task
🎮
Simulate
📦
Generate Data
🧠
Train Model
📏
Evaluate
🤖
Deploy
04

Market Opportunity

📈 TAM Analysis

World model companies (funded)$5B+ raised
Robot startups (π0 ecosystem)100+ companies
AV companies (simulation needs)50+ companies
Enterprise robotics adoptionGrowing 30%+ YoY

Conservative estimate: $2B addressable market for embodied AI infrastructure by 2028

💰 Business Model

  • Compute: GPU-hours (sim + training)
  • Storage: Video dataset hosting
  • Platform: Monthly SaaS for orchestration
  • Enterprise: Dedicated clusters + support

Target: 40-60% gross margins (better than pure GPU cloud)

05

Competitive Position

Player Sim Data Train Eval Deploy
CoreWeave/Lambda
NVIDIA Omniverse ~
Genesis (OSS)
Weights & Biases ~ ~
Us (Full Stack)

The Integration Thesis

Embodied AI requires tight coupling between simulation, data, and training. Point solutions create friction. An integrated platform captures the full workflow — and the full margin.

01
Mission Control

Kubernetes for
AI Coding Agents

Deploy, coordinate, and govern fleets of Claude Code, Cursor, and Codex—so your team ships 10x faster with verification.

95%
of enterprise AI pilots fail to deliver ROI

We fix that.

02

The $40B Problem

Enterprise spent $30-40B on AI pilots. Most failed—not because the models are bad, but because nobody built the infrastructure to run them safely.

95%

AI Pilots Fail

MIT study: lack of context, poor verification, no adaptation. The agent isn't the problem—the harness is.

70%

Devin Task Failure

Answer.AI study: only 3/20 tasks completed. Best autonomous agent still needs orchestration.

$200-400

Real Cost of "$20" Plans

Cursor's hidden API fees surprise users. One team spent $8,000 on a "$200" plan.

0

Cross-Session Learning

No coding agent learns from failures. Same mistakes repeat every session. Zero organizational memory.

"The bottleneck is now having multiple agents at once."

— r/GithubCopilot, 2026
03

Four Pillars of Agent Fleet Management

Every failed AI pilot breaks down on one of these. Mission Control solves all four.

🎯

Task Alignment

Does the agent understand what you actually want? Intent verification, scope control, semantic diff between ask and interpretation.

Verifiability

How do you know it's right? Automated pipelines: tests, security, quality gates. Verified before it touches production.

Ability

Can the agent actually do this? Route tasks to best-fit agent. Claude Code for reasoning, Cursor for flow, Codex for CI/CD.

🧠

Adaptability

Does the system learn? Auto-generate rules from corrections. One person's fix helps the whole team.

The Insight

The agent is commodity. The harness is moat. Everyone's building better agents—nobody's building the infrastructure to run 50 of them safely on a production codebase. We are.

04

How Mission Control Works

From chaos to coordination in four steps.

1

Task Decomposition

Complex request → atomic, verifiable steps with dependency graph

2

Smart Routing

Route each subtask to best-fit agent based on capability profiling

3

Parallel Execution

Multiple agents work simultaneously, conflicts prevented automatically

4

Verify & Learn

Automated verification, corrections become team-wide rules

Without Mission Control

  • Manual agent juggling

    5 tabs, 5 agents, merge conflicts everywhere

  • Review everything

    45+ minutes per agent PR, AI slop review fatigue

  • Same mistakes repeat

    No learning between sessions or team members

  • Surprise bills

    $20/mo → $400/mo actual spend

With Mission Control

  • Unified dashboard

    All agents in one view, automatic coordination

  • Pre-verified PRs

    12-minute reviews, confidence scores, semantic diffs

  • Continuous learning

    Auto-generate .cursorrules, team-wide improvement

  • Predictable costs

    Route to cheapest capable agent, budget alerts

05

Why Now, Why Us

Market Timing

  • 🚀
    Multi-agent is inevitable

    VS Code 1.109 enables multi-agent dev. GitHub Copilot + Claude + Codex side-by-side. No one owns orchestration.

  • 📈
    YC is 50% AI agents

    25% of YC companies have almost entirely AI-generated codebases. They need this.

  • 🏆
    Models peaked, harness didn't

    Claude Code at 72.5% SWE-bench—best in class. The gap is now coordination, not capability.

Competitive Landscape

Tool Gap
Langfuse Observes, doesn't orchestrate
LangSmith LangChain lock-in, no self-host
Cursor/Devin Single-agent, no coordination
Linear/Jira Track humans, not agents
CrewAI Framework-specific, not code-native

We're agent-agnostic, code-native, verification-first.

The Moat

Three things compound: (1) Routing models improve with every task—network effects. (2) CI/CD integration is sticky—once you're in, you stay. (3) Verification layer builds trust—that's years of R&D, not a feature.

01
RECRUITMENT

10,000 Roles Go Unfilled Every Day.
We Fill Them While You Sleep.

The AI recruiting agency that takes roles from intake to offer — no humans in the loop until the interview.

$200B+
Global Staffing TAM
80%
Applications Never Get a Response
12 days
Avg Time-to-Fill → We Do 3
150%
Annual Hourly Worker Turnover
02

The Problem

"Every time I log into LinkedIn Recruiter I feel like I'm being mugged by Microsoft with a smile. The damn thing is $10k+ a year and for what?"
— r/recruiting
💸
$50-100K/year across 3-4 tools
LinkedIn Recruiter, Greenhouse, Gem, SeekOut — and still doing manual work
🕳️
80% of candidates hit a black hole
"Greenhouse is a resume black hole I never hear back from"
📉
InMail response rates at all-time lows
"1 in 5 responds if lucky. Most devs auto-ignore us."
🔍
Search is fundamentally broken
"You want a backend engineer with Python? Here are 300 customer success managers and a dentist."
👻
44% of hires already in your ATS
Silver medalists rot while you pay to source new candidates
⏱️
70-80% of recruiter time is automatable
10-15 hours per hire spent on tasks AI can do today
03

The Opportunity

Why Now? LLMs crossed the threshold. GPT-4+ can do natural language search, personalized outreach, and conversational screening that doesn't feel robotic. Meanwhile, LinkedIn fatigue is peaking — 43% of recruiters actively seeking alternatives (up from 26% in 2024).

Market Size

📊
$200B+ TAM — Global staffing market
🏭
80% is hourly roles — Warehouse, retail, logistics
📈
100-150% annual turnover — Constant hiring demand

Competitive Landscape

🦄
Mercor ($10B) — Proved model, pivoted to AI training
🎯
Juicebox ($200M) — PLG sourcing, not full-funnel
🏢
Paradox ($1.5B) — Enterprise only ($25K+)
The Gap: No one owns full-funnel automation for the SMB/mid-market. Enterprise tools are too expensive. Point solutions don't talk to each other. The hourly hiring market is wide open.
04

The Solution

One AI agent that owns the entire candidate journey — source to offer.

1️⃣
INTAKE (5 mins)
Natural language role description → AI generates ideal candidate profile + search plan
2️⃣
SOURCE (Autonomous)
AI searches 30+ sources, scores/ranks candidates, resurfaces silver medalists from ATS
3️⃣
OUTREACH (Autonomous)
Multi-channel sequences (SMS, email, WhatsApp), handles replies and FAQs automatically
4️⃣
SCREEN (Semi-Autonomous)
AI conducts phone/video screens overnight, scores against rubric, surfaces interview-ready candidates
5️⃣
SCHEDULE (Autonomous)
Coordinates across calendars, handles reschedules and no-shows, sends prep materials
6️⃣
CLOSE (AI-Assisted)
Comp benchmarks, offer letter generation, acceptance tracking
"We screen 50 candidates overnight. Recruiters wake up to 8 interview-ready."
05

GTM & Traction

Phase 1: Vertical Wedge

🎯
Target: Regional staffing agencies (10-50 recruiters) in warehouse/logistics
💰
Budget: $25K-50K/year in tooling spend
🤝
Offer: "Fill 20% of roles in 60 days or free"

Pricing Model

📋
$199/mo base — Platform access
$500/successful hire — Aligned incentives

Target Metrics (Demo Day)

3 days
Time-to-fill (vs 12 baseline)
60%
Lower cost-per-hire
3x
Higher response rate than InMail
100 reqs
Per recruiter (vs 15)
Traction Narrative: "5 staffing agencies. 47 roles filled. Average time-to-fill: 4 days (baseline was 14). They're paying $500/hire and saving $2K vs previous process. $100K ARR run rate."
06

The Ask

$2M Seed

Expand to 50 agencies by end of year

Use of Funds

👥
Hiring: 3 engineers, 2 sales
🔧
Product: AI phone screen, multi-channel outreach
📈
GTM: ASA events, content, case studies

12-Month Milestones

🎯
50 agency customers
💵
$1M ARR
📊
10,000+ roles filled
🏆
Series A ready at $150-200M
"We're the AI agency for hourly hiring. $200B market, no full-stack autopilot exists. We built it."
01
CONSULTING

The 70% Problem

Consulting is 70% intelligence work (research, analysis, modeling) and 30% judgment work. We automate the 70%—the nights and weekends work—so consultants can focus on what actually matters.

$300B+
Global Consulting TAM
60-90%
Time Spent on Manual Research
27%
Work Immediately Automatable
$116B
AI Consulting Market by 2035
02
PROBLEM

McKinsey Analysts Shouldn't Be PowerPoint Jockeys

"At McKinsey, we spent over 90% of our time on manual work—reading reports, building Excel models, creating presentations." — Grasp Founders

10+
Tools Per Project (copy-paste hell)
50%
Slide Time Spent on Formatting
$50K+
AlphaSense Cost Per Seat

"AI generates generic fluff, not MBB-quality output. Copilot can't do action titles. No tool understands Pyramid Principle or MECE structures."

03
INSIGHT

Sell TO Consultants, Not Replace Them

Operand says "AI to Kill McKinsey." We say: give McKinsey analysts superpowers. The winning model is augmentation—Grasp has 200 customers and 3.5x ARR growth proving it.

Grasp
$9M raised, 200 customers, augmentation
Perceptis
"Superpowers, not replacement"
2-3x
More Proposals, 40-70% Win Lift

The Killer Gap: Verifiability. Consultants don't trust AI outputs. Every claim needs traceable sources, explicit assumptions, confidence intervals. No one does this yet.

04
SOLUTION

AI Analysts for Consulting Firms

Multi-agent system that turns days of research into client-ready deliverables in hours. Research → Analysis → Deliverable—one unified workflow with human-in-the-loop verification.

Research Agent
20+ sources, synthesized, every claim cited
Analysis Agent
Models, scenarios, visualizations
Deliverable Agent
MBB-quality decks, Pyramid Principle
Verification Layer
Source links, confidence scores, audit trail

Why Now: Multi-modal AI finally good enough. Reasoning models enable complex analysis. Consulting under cost pressure. Talent arbitrage—ex-MBB available to train AI and verify outputs.

05
GTM

Start With Boutiques, Not Big Four

10-50 person consulting shops. Ex-MBB founders who know what "good" looks like. Fast decision cycles, desperate for competitive advantage, can't build in-house AI.

10-50
Person Firms (Sweet Spot)
$2-50M
Revenue Range
$200-500
Per Seat/Month Target
SOC-2
Early (Enterprise Must-Have)

Expansion Path: Boutiques → In-house corporate strategy teams → PE portfolio companies → Big Four individual teams → Enterprise. Grasp already serves "most of the Big Four" after starting narrow.

06
ASK

Building the AI Operating System for Consulting

We're Cursor for consultants—AI that actually understands strategy, not just generates slides. Targeting $2M seed to build the verifiable, consulting-grade AI workflow that doesn't exist yet.

80%
Research Time Reduction Target
3.5x
ARR Growth (Grasp Benchmark)
10x
Cheaper Than AlphaSense
$100B+
US Gov Consulting Spend (YC RFS)

One-Liner: "We're building AI analysts for consulting firms. Our multi-agent system turns days of research into client-ready deliverables in hours—with every claim verifiable and humans always in control."

01
Vertical Deep Dive • Supply Chain & Procurement

Autonomous Procurement Agents
That Replace $180B in Human Labor

We're building AI agents that do the work of procurement BPOs at 1/20th the cost. Pay-per-outcome, not seats.

18:1 Labor vs Software Spend 90% CPOs Exploring Agents Incumbents at 1.2/5 ⭐

Enterprises spend over $180 billion annually on procurement talent, compared to roughly $10 billion on procurement software, reflecting how much work still happens manually around existing systems.

— Lio PR / Industry Analysis

02

The 18:1 Arbitrage

When companies spend 18x more on people than software, the work is clearly being done by humans. That's our opening.

$180B

Annual spend on procurement talent

Human labor doing manual work
$10B

Annual spend on procurement software

Tools that don't actually do the work

The Insight

Software hasn't automated procurement — it's just digitized paperwork. The work is still done by humans. AI agents can now do that work.

90%
CPOs considering AI agents
ProcureCon 2025
$5.8B
Procurement BPO market
Already outsourced = proven spend
2%
Spend leaks annually
McKinsey — unfulfilled obligations
58%
Struggle to find talent
CIPS 2025
03

The Incumbent Disaster

SAP Ariba is the market leader. Users hate it. This is our opening.

⭐ 1.2 / 5 on Trustpilot

SAP Ariba — 98 reviews, near-universal hate

"This software makes me wanna quit my job. This should not exist."

— SAP Ariba User, Trustpilot

"Logs me out 5720937 times a day... like software from 1980"

— SAP Ariba User, Trustpilot

"If you are a supplier, THIS WILL HURT YOUR BUSINESS."

— SAP Ariba User, Trustpilot

Why Incumbents Can't Fix This

🐌

Legacy architecture

Built pre-cloud, can't rebuild without breaking everything

💸

Misaligned incentives

Revenue from supplier fees, not buyer value

🔧

IT-heavy configuration

6-12 month implementations, constant IT involvement

🤷

Support is non-existent

"Tell me it's not their department"

What Buyers Actually Want

  • Fast implementation (days, not months)
  • Consumer-grade UX
  • Pay for outcomes, not seats
  • Measurable ROI from day one
04

GTM Playbooks That Work

The winners in this space have proven wedges and go-to-market strategies we can learn from.

🎯 Zip — The Cold DM Strategy

Before building anything, founders DMed hundreds of procurement managers on LinkedIn — not to sell, but to learn.

  • Avoided selling to anyone they knew (tested true PMF with strangers)
  • Customers "ready to buy before the demo was even complete"
  • 4 customers in 6 months of product development

Why It Worked

Built for a "boring, massive" market with "sleepy incumbents" and "low NPS"

💰 Magentic — Pay-Per-Cure

Zero upfront cost. "You never pay for seats or shelfware." Implementation in days, not months.

  • If no savings, no payment — complete risk reversal
  • Found 25% of supplier docs had errors impacting P&L
  • $10-20M P&L impact per customer

Sequoia Thesis

"In the old world, SaaS sold the promise of ROI. In the new world, AI actually delivers it."

🤖 Pactum — Tail Spend Wedge

Started with "long tail" suppliers enterprises never negotiate with — low risk, high volume.

  • Walmart pilot: 89 suppliers, 5 buyers, 3 months
  • $25K proof-of-concept pricing
  • 68% supplier acceptance, 3% avg savings
  • 75% preferred negotiating with bot over human

👥 Lio — BPO Displacement

Positioned as replacing outsourcing, not software. Competes against BPOs (20x cost of software).

  • 75% of outsourced work automated in 6 months
  • 95%+ adoption rate, 100% retention
  • 10 FTE freed per customer
  • German enterprise entry (Munich Re, Schaeffler)
05

Where We Win

The four-pillars framework shows exactly where agentic AI outperforms existing tools.

Area Legacy Gap Agentic Opportunity
Intake & Approval Email chains, manual routing Autonomous triage and routing
Negotiation Human-only, can't scale AI negotiation at scale (2000 suppliers simultaneously)
Contract Compliance Periodic audits, 2% value leakage Continuous monitoring, proactive alerts
Supplier Management Reactive, manual tracking Proactive risk sensing, autonomous action
Invoice Processing OCR + human review End-to-end matching and payment

White Space We Target

🏭

Mid-market

Coupa too expensive, Ariba too painful

📄

Supplier compliance

Wide open — Magentic early stage

🇺🇸

US industrial

Tacto owns Germany, US gap

🏥

Vertical-specific

Healthcare, construction procurement

Our Entry Wedge

Managed Procurement Ops

Like our SDR offering, but for procurement. We operate the agents, customers get outcomes.

  • Start with supplier compliance (Magentic model)
  • Pay-per-savings pricing
  • Expand to negotiation, sourcing
06

Why Now

🤖 Capability Threshold

LLMs can now read contracts, negotiate, and execute end-to-end. Pactum proved it at Walmart scale.

🌍 Supply Chain Polycrisis

COVID, Ukraine, tariffs created urgency. "The old answers—another dashboard or SaaS tool—are spent."

👥 Talent Shortage

58% struggle to find/retain procurement talent. BPO arbitrage is ending as AI gets cheaper than offshore labor.

"We're entering a phase in the enterprise where AI moves beyond workflow co-pilots to autonomous, multi-agent execution."

— Seema Amble, a16z

Investment Thesis

$180B of human labor waiting to be automated.

Incumbents at 1.2/5 stars. 90% of CPOs exploring agents.

✅ High intelligence-to-judgment ratio  •  ✅ $5.8B BPO proves willingness to pay  •  ✅ Measurable P&L impact  •  ✅ Sequoia + a16z conviction

HIGHLY ATTRACTIVE vertical for AI autopilot investment

01
Vertical • Insurance Brokerage

Insurance Is 95% Coordination,
5% Judgment.

39,000 agencies run on email and spreadsheets. We're building the AI that replaces their back office — delivering outcomes, not tools.

$140-200B TAM YC RFS Vertical 3 Unicorns Emerging

Getting one of these businesses insured takes ~50 steps over two weeks. The broker's actual judgment matters for maybe 5% of the process. The other 95% is pure coordination.

— Panta (YC W26)

02

YC & VCs Are All-In on This Vertical

Sequoia, Emergence, Khosla, and YC are funding AI-native brokerages. YC's 2026 RFS explicitly calls out "AI-Native Agencies."

"Agencies of the future will look more like software companies, with software margins. And they'll scale far bigger than any agencies that exist in these fragmented markets today."

— YC Request for Startups, Spring 2026

Recent Funding

WithCoverage

700+ clients in 18 months • Growth startups (GoPuff, Bombas, EightSleep)

$42M Series B
Sequoia, Khosla, 8VC

Harper

5,000+ clients • Middle America SMBs (daycares, dealerships, restaurants)

$47M Series A
YC W25 • Emergence, Peak XV

Panta

Hard-to-place E&S risks • Trucking, nightclubs, construction

YC W26
In production Dec 2024

Pattern

All three prove the same thesis: AI can handle the 95% coordination layer, freeing brokers for the 5% that actually requires judgment.

03

Brokers Are Drowning in Coordination

The industry runs on copy-paste, portal juggling, and endless email chasing.

67%
Access 5+ carrier portals weekly
Glia 2024 Agent Report
76%
Can't find/access markets to quote
IVANS 2025
2+ hrs
Average underwriter response time
Glia 2024
79%
In carrier portal when needing help
Glia 2024

The 50-Step Process

📝

Intake & Forms

ACORD forms, questionnaires, documentation

📤

Submission to Carriers

Submit to 5-20 carriers per risk

🔄

Chase Underwriters

Follow-up emails, answer questions

📊

Compare & Present

Quote comparison, gap analysis

Bind & Service

COIs, endorsements, renewals

Why It's Ripe for AI

🤖

Structured & Repetitive

Same 50 steps, every single time

📋

Document-Heavy

PDFs, forms, emails — all parseable

🔗

API Access Exists

1,000+ carrier APIs via IVANS

💬

Communication-Centric

Email/chat = agent-native

Time-Sensitive

Speed = competitive advantage

04

Managed AI Ops for Insurance Brokers

We don't sell software. We deliver outcomes: placements, renewals, COIs — done.

🎯
Broker Goal
"Place 40 accounts/month"
Our Agent Fleet
Submissions, follow-up, quotes
🤖
Autopilot Execution
50 steps automated
Bound Policies
10x capacity, same staff

AI SDR Tools (Clay, Apollo, etc.)

🛠️

You configure and manage

Become pseudo-IT for AI

📉

70% churn in 3 months

Tools don't deliver outcomes

No domain expertise

Generic tools, generic results

OpenHolly Managed Ops

We run the agents

You focus on advising clients

🎯

Outcome-based pricing

Pay per placement, not per seat

🏢

Insurance-native playbooks

ACORD forms, carrier APIs, COIs

The Panta Insight

One broker: 400 clients. One AI-augmented broker: 4,000 clients.

99% placement rate vs industry ~60%. Quote turnaround: days → hours.

05

GTM: Start with Mid-Market Agencies

35,000 agencies with <$2M revenue have no AI tools. They can't afford Zywave. We're their answer.

🎯 Target Segment

Mid-market agencies ($1-10M revenue)

  • Sophisticated enough to understand value
  • Small enough to decide quickly
  • Large enough to pay real money
  • Hungry for competitive edge

🚪 Entry Points

  • Free hook Insurance portfolio analysis
  • Partner Aggregators (Smart Choice, Keystone)
  • Direct State association chapters (Big "I")
  • Referral Accountants serving SMBs

💰 Unit Economics

  • $5-50K avg commercial premium
  • 10-15% commission per policy
  • $500-5K revenue per placement
  • 70%+ gross margin (vs 30% trad)

WithCoverage Playbook

"Thousands of calls, travel across dozens of states. Offer free insurance analysis showing overpayment. In 18 months, 700+ of the fastest growing companies switched to us." — Max Brenner, CEO

06

Why Now: Perfect Storm

🤖 AI Capability Inflection

  • Multimodal models read PDFs, emails, forms
  • Agent frameworks enable 50-step automation
  • 1,000+ carrier APIs finally available (IVANS)

📉 Fragmented Market Ready

  • 39K agencies → 38K (consolidating)
  • 1/3 expect ownership changes in 5 years
  • Most run on "email and spreadsheets"

🐢 Incumbents Are Slow

  • Zywave just launched AI (Dec 2025)
  • Applied Systems: "augmentation not replacement"
  • PE-owned = margin focus, not innovation

✅ Proven Demand

  • WithCoverage: $42M, 700+ clients
  • Harper: $47M, 5,000+ clients
  • Panta: YC W26, in production

The Opportunity

$140-200B
Broker TAM (Sequoia)
39K
Agencies Underserved
95%
Work That's Automatable

The playbook is proven. The market is massive. The incumbents are asleep.

01
Vertical: IT Managed Services (MSP)

AI IT Support That Resolves Tickets,
Not Just Routes Them

We automate 50% of IT helpdesk tickets in week one. Incumbents are hated. 67% of orgs can't hire enough techs. The market is ready.

$100B+ TAM 90% L1 Automatable ConnectWise/Kaseya Hated

"Faster to automate a task forever than to do it manually once."

— Serval's Core Pitch ($127M raised, $1B valuation)

02

The Problem: Chronic Technician Shortage

MSPs can't hire enough techs. Their tools are outdated. AI can finally solve L1.

67%
Organizations understaffed on IT/security
52%
MSPs say hiring is primary struggle
3.5M
Global cybersecurity workforce gap
90%
L1 tickets ServiceNow resolves autonomously (internal)

The Insight

Can't hire fast enough → must automate. L1 is now automatable. The gap is who does it for SMBs.

03

Incumbents Are Hated

ConnectWise and Kaseya have created massive dissatisfaction. MSPs are actively seeking alternatives.

🔴 ConnectWise

  • "Archaic UX" — looks like 2010 software
  • "Little to zero updates for 10+ years"
  • "Reporting is essentially garbage"
  • SSO security debacle damaged trust
  • Market share: 26.8% → 24.3% (losing)

🔴 Kaseya / Datto

  • "Chronically overcharge clients for thousands of dollars per year"
  • Billing errors are systemic, not one-off
  • CEO minimizes problems, lies about fixes
  • Acquired good companies → quality dropped
  • "Dollar store ConnectWise"

"ConnectWise is less bad overall"

— r/msp (damning with faint praise)

Structural issues: PE ownership → cost-cutting, poor support, billing games, technical debt. They can't rebuild AI-native. We can.

04

$100B+ Market, Nobody Owns It

$100B+

Global MSP TAM

50%+

Tickets automatable (proven by Serval)

$1B

Serval valuation (enterprise only)

Where We Play

Segment Status
Enterprise ($100K+ ACV) Serval owns this
SMB Direct (50-500 employees) Blue ocean
Small MSPs (5-20 techs) Good entry wedge

Why Now

🤖

L1 is now automatable

GPT-4+ enables reliable action execution

📉

Incumbent lock-in weakening

ConnectWise/Kaseya losing share, MSPs looking

🏗️

AI-native architectures possible

Code-gen enables custom workflows

05

The Playbook: Start AI-Native, Own the SMB

Serval proved it with enterprise. We do it for the rest of the market.

🚀
Phase 1
Small MSPs (5-20 techs)
💰
Phase 2
SMBs direct ($30-80/employee)
🎯
Endgame
IT-as-a-service for SMBs

What We Automate (Week 1)

🔑

Password resets (15-25% of tickets)

90%+ automatable

🔐

Access provisioning (15-20%)

Okta, Google Groups, SCIM

📦

Software install/config (10-15%)

Standard apps, self-service

👋

Onboarding/offboarding (10-15%)

Day 1 automation

Competitive Advantage

Us Them
AI-native architecture Bolt-on AI to legacy
Code-based workflows (auditable) Black-box AI
Outcome pricing (per ticket) Seat-based (pay even if unused)
Deploy in days 6-month implementations
06

The Ask

🎯 Target Design Partners

  • Small MSPs (5-20 technicians)
  • Pain with ConnectWise/Kaseya billing
  • Drowning in L1 tickets
  • Open to AI (not defensive)

Distribution: r/msp, IT Nation, MSP peer groups

💰 Pricing Model

  • Entry Per ticket resolved — prove ROI fast
  • Scale Per technician — $149-209/tech/mo
  • SMB Direct Per employee — $30-80/mo

Start free, pay when we hit 30% automation.

The Thesis

Every SMB outsources IT to MSPs. MSPs are dissatisfied with ConnectWise/Kaseya. 50%+ of tickets are automatable.
Nobody is selling "your IT just runs" directly to SMBs as an outcome.

Serval raised $127M doing this for enterprise. The SMB market is unowned.

"The IT team that scales with you, without headcount."

01
Vertical • Healthcare RCM

Autopilot for Medical Billing

We recover millions in denied claims. AI agents that fight back against payer denials — so hospitals can focus on patients, not paperwork.

Healthcare RCM $50-80B TAM 11.8% Denial Rate Payers Using AI to Deny

One insurer allegedly denied 300,000 claims in under two months using AI. Providers need their own AI to fight back.

— Healthcare Industry Report, 2024

02

The Problem: Payers Are Winning

Hospitals are drowning in denials. Payers deploy AI to reject claims faster. Providers still fight with spreadsheets.

11.8%
Initial claim denial rate (up from 10.2%)
2024 Industry Data
$25-181
Cost to rework each denied claim
HFMA
60%
Providers can't hire RCM staff
Becker's Healthcare
300K
Claims denied by one payer AI in 2 months
Industry Report

⚠️ The Arms Race

Payers are deploying AI to deny claims faster. Medicare Advantage denials up 4.8% YoY. Providers need AI to fight back — or they lose.

03

Why Now: $50-80B Opportunity

Market Size

💰

$50-80B outsourced RCM

Massive existing spend, ripe for automation

📈

Growing denial complexity

89% saw PA requirements increase in 2024

👴

40%+ coders retiring

Massive labor shortage, no backfill coming

Why AI Wins Now

🧠

LLMs can read 50K-word records

Finally capable of clinical document understanding

📋

150K+ ICD-10 codes now tractable

AI accuracy approaching human coders

⚖️

CMS mandating electronic PA

Regulatory tailwind forcing digitization

The Insight

Outcome-based pricing is native to healthcare — providers already pay % of collections. We align incentives: we only win when they recover money.

04

Competitive Landscape

Legacy vendors are slow. AI-native players are enterprise-only. The mid-market is wide open.

Company Focus Strengths Gap
Anterior ($64M+) Prior auth (payer-side) 99.24% accuracy, KLAS validated Payer-only, not provider-side
AKASA Provider RCM (enterprise) Cleveland Clinic, Stanford 12+ month sales cycles, expensive
Fathom Medical coding only 95.5/100 KLAS score Narrow focus, no denial mgmt
Waystar / R1 Legacy platforms Scale, integrations "AI" is mostly marketing, slow
OpenHolly Denial recovery Outcome-based, fast deploy

Our Wedge: Denial Management

Start with the most measurable outcome: dollars recovered from denials.

15-20% of recovered revenue. We only win when you recover money.

05

How It Works: Agentic Denial Recovery

AI agents that read charts, identify denial root causes, generate appeals, and submit — automatically.

📥
Denial Arrives
Payer rejects claim
🤖
AI Reads Chart
50K words in 8 seconds
📝
Appeal Generated
Clinical evidence cited
$ Recovered
You keep 80-85%

🔍 Root Cause Analysis

AI identifies why denial happened — missing documentation, coding error, or arbitrary payer rule.

📋 Evidence Extraction

Highlights relevant clinical passages from 50K-word records in seconds.

⚡ Auto-Appeal

Generates payer-specific appeal letters with guideline citations. Human reviews in 5 minutes.

06

Go-to-Market & The Ask

Entry Strategy

🎯

Start: Mid-size specialty practices

Orthopedics, cardiology, oncology (high-value procedures)

🔑

Wedge: Denial recovery audit

Low risk entry — AI is checking, not deciding

📈

Expand: Full denial management

Then coding QA, prior auth automation

Pricing Model

💰 Outcome-Based

15-20%

of recovered revenue from denials

You only pay when we recover money. Zero risk.

Looking For: Design Partners

3-5 specialty practices with 500+ denials/month

We'll recover $100K+ in year one — or you pay nothing.

Outcome-Based Zero Risk Trial HIPAA Compliant
01
Vertical • Accounting & Audit

AI That Closes the Books
While Accountants Sleep

340K accountants left the profession. 75% of CPAs are retiring. The close still takes 6+ days. We're building the autopilot.

$50-80B TAM 94% Still Use Excel Basis: $1.15B Validation

"Accounting is structured, high-stakes, and essential to every business on earth. It's also one of the most underbuilt areas in technology."

— Basis founders (valued at $1.15B)

02

The Perfect Storm

A profession in crisis meets primitive tooling. Something has to give.

340K
Accountants left since 2020
Bureau of Labor Statistics
75%
CPAs nearing retirement
AICPA/NASBA 2025
-30%
CPA exam candidates since 2016
Industry data
120K+
Open jobs per year
Ramp 2026

The Tooling Problem

📊

94% still use Excel for close

50% cite it as key reason close is slow

⏱️

50% take 6+ business days to close

Only 18% close in 3 days or less

💸

20-50 hrs/month on cash reconciliation

#1 bottleneck — 3-5 systems just to match

Bench Collapse Proves the Thesis

"Bench raised ~$160M. Shut down Dec 2024. Human-heavy bookkeeping models can't scale profitably."

— Industry lesson

🔥 "Bench Refugees" = Urgent Demand

Thousands of abandoned customers actively looking for alternatives. Distrust human-heavy models. Ready for AI-first.

03

Market Validation: $1.15B Proves the Opportunity

Basis raised $100M Series B at $1.15B valuation in Feb 2026. The market is real.

🦄 Basis ($1.15B)

30% of Top 25 US firms already using

20% of Top 150 firms

First AI agent to complete end-to-end 1065 tax return autonomously

$100M Series B, Feb 2026

📈 Rillet (a16z + ICONIQ)

200+ customers, Series B

AI-native ERP — not AI bolted onto legacy

"Go live in weeks" vs months

Doubled ARR quarter-over-quarter

🔍 Truewind ($17M)

Top 50 accounting firm partners

100+ customers

"Absorbs 47% of month-end close tasks"

Series A, Dec 2024

The TAM

$50-80B market for accounting automation

Every business needs accounting. It's recurring. It's essential. And it's still done by hand.

04

The Autonomous Close Agent

AI that runs overnight, completes 90% of month-end tasks, and generates an exception report for morning review.

🌙
6 PM
Run Autonomous Close
🤖
Agent Fleet
Works through 30+ tasks
☀️
8 AM
Exception report ready
Review & Sign
5-10 items vs 100+

What the Agent Does

🏦

Bank Reconciliation

Auto-matching across all sources (95% accuracy)

📑

Transaction Classification

90%+ accuracy with LLMs, learns from corrections

📊

Accrual Workpapers

Auto-generated with supporting documentation

📈

Variance Analysis

AI explains anomalies, humans review

The Outcome

Metric Before After
Close Time 6+ days <3 days
Cash Rec Hours 20-50 hrs <5 hrs
Manual Tasks 80% 10%
Error Rate 1-5% <0.5%

Your Accountant Reviews the Work Instead of Doing the Work

Everything is auditable. Everything is documented. AI generates, humans verify.

05

Why Now

🧠 AI Capability Inflection

  • LLMs now achieve >90% accuracy on transaction classification
  • Basis demonstrated first autonomous 1065 tax return (Feb 2026)
  • Multi-hour autonomous agent workflows proven possible

📉 Structural Talent Crisis

  • 340K+ accountants left since 2020
  • 75% of CPAs nearing retirement
  • Projected enrollment decline of 15% (2025-2029)

Why Incumbents Can't Catch Up

🏚️

NetSuite is 25 years old

QuickBooks architecture is rigid

🤖

Adding chatbots, not agents

Rillet raised $50M+ because "rebuilding from scratch" is the only way

💼

Big Four won't build this

Services business, not software. Internal innovation killed by billable hour model.

Our Entry Point

🔥

Bench Refugees

Urgent need, distrust of human-heavy, ready for AI-first

📈

Mid-Market Outgrowers

$10-100M revenue, 2-5 person finance teams drowning in close

🏢

Small CPA Firms (100-500 clients)

Fast adoption, price sensitive, willing to try

06

The Opportunity

$50-80B
Total Addressable Market
$1.15B
Basis valuation (market validation)
340K
Accountants gone = demand for AI

Pricing Model

Per-Entity Most common for accounting software

Volume Tiers Common for AP/AR automation

Outcome-Based Emerging: % of cost savings

$2K – $15K/mo

Based on entity count + transaction volume

ICP: Design Partners

🏢

$10-100M revenue

Outgrowing QuickBooks, avoiding NetSuite

👥

2-5 person finance team

Drowning in close process

🏗️

Multi-entity structures

SaaS, e-commerce, hospitality

AI That Closes the Books While Accountants Sleep

The talent crisis is permanent. The tooling is primitive. The AI is ready.

We're building the autopilot for accounting.

01
Vertical Pitch • Claims Adjusting

Autopilot for Insurance Claims

We sell the adjustment, not the software. 400K workers retiring. $50-80B market. AI-native TPAs are the future.

$50-80B TAM $730M Exit 400K Retiring Sequoia Top Pick

Services: The New Software. The biggest opportunity isn't selling tools to adjusters — it's replacing what adjusters do.

— Sequoia Capital, March 2026

02

The Perfect Storm

Structural workforce crisis meets broken incumbent tech.

400K
Insurance workers retiring by 2026
25% of workforce is 55+
70%
AI SDR users churn in 3 months
Industry data
18-24
Months to implement Guidewire
Industry standard
$730M
EvolutionIQ acquired (Dec 2024)
CCC acquisition

"Half a billion between software, personnel, and opportunity cost" for Guidewire implementations that still fail.

— Industry Analysis

The Reality

Within 15 years, a large portion of today's adjusters will have retired — and there won't be enough people to replace them.

03

Why Now: Market Validation

Smart money is flooding in. Exits are happening. The window is open.

💰

EvolutionIQ → CCC: $730M

December 2024. AI claims automation is now a proven exit category.

Company Model Funding Traction
Strala AI-native TPA Founders Fund (13x oversubscribed) 26 US clients, UK expansion
Pace Operations automation Sequoia $10M Series A Prudential multi-year deal
Elysian Complex commercial claims AmFam Ventures $6M State Farm pitch winner, Lloyd's Lab
Tractable Photo AI (point solution) $1B+ unicorn Auto insurers, property

The Gap

Tractable sells photo AI. Shift sells fraud detection. Nobody sells the full claim outcome — FNOL to settlement, end-to-end.

04

Entry Strategy: TPAs & MGAs

Start where decisions are fast and budgets already exist.

Why TPAs/MGAs First

Already outsourced

Budget line exists. Vendor swap, not new category.

🏃

Faster decisions

Not 12-24 month enterprise sales cycles

💰

Undercut by 30-50%

Per-claim pricing below legacy TPA rates

📈

Expand upward

Use TPA wins to land large carriers

What We Deliver

📞 FNOL Intake Agent
📋 Coverage Verification
🔍 Investigation + Fraud
💵 Estimate + Settlement
Claim Closed

The Strala Playbook

Start with FNOL/triage → Hybrid deployments → Full TPA as trust builds

"The answer can't always be more people." — Strala

05

The Numbers That Matter

Outcome-based pricing. Clear ROI story.

⏱️ Cycle Time

Industry avg 30+ days
Top performers 10 days
AI Target 3-5 days

💰 Cost per Claim

Legacy TPA $250-1,500
Our pricing 30-50% less
+ Outcome bonus Per day saved

📊 Claims Volume

Human adjuster 50-100/month
AI agent fleet 500+/month
Loss ratio impact -1-2pp

Pricing Model

  • Per Claim 30-50% below legacy TPA rates
  • Outcome Bonus +$X per day under cycle target
  • Pilot Free/discounted on subset → expand

Loss Ratio Impact

Strala claims 1 point loss ratio improvement. That's the number carriers care about most.

— Industry benchmark

06

The Opportunity

🎯 Why This, Why Now

  • $50-80B TAM in claims adjusting spend
  • 400K workers retiring, pipeline empty
  • $730M exit validates category
  • Sequoia + Founders Fund backing competitors
  • Guidewire "half billion" disasters create opening

🚀 Our Edge

  • Fat startup model — outcomes, not tools
  • Playbooks compound — each customer teaches us
  • Per-claim pricing — undercut legacy by 30-50%
  • TPA wedge — fast sales, expand upward
"Autopilot for insurance claims — we sell the adjustment, not the software."

The workforce is retiring. We're what comes next.

$50-80B Market Validated by $730M Exit Perfect Timing
01
TAX ADVISORY

The CPA Extinction Event

75% of CPAs are reaching retirement. 340,000 accountants left the profession since 2020. But tax work is 80-90% pure intelligence work—the exact work AI agents do best. We're building autopilot for tax preparation.

$30-35B
US Tax Preparation TAM
80-90%
Intelligence Work (Automatable)
340K
CPAs Left Profession (2020-22)
75%
CPAs Nearing Retirement
02
PROBLEM

Tax Season Is Breaking People

"Endless hours, stressed teams, client overload, constant risk of missing deadlines." 42% of firms report retention issues from burnout. The people who do stay work 60-80 hour weeks for months.

42%
Firms Report Retention Issues
60-80
Hours/Week During Tax Season
61.5%
Say Price Is #1 Complaint

"Difficulties with state returns came up repeatedly in 'dislike' responses. Multi-state complexity multiplies fast, and manual tracking of different state rules becomes impossible at scale."

03
INSIGHT

Multi-State Is the Unsolved Problem

Incumbents charge $60K+/year for seat licenses. Blue J achieved 12x revenue growth via CPA.com distribution. But nobody has cracked multi-state complexity—nexus determination, varying apportionment rules, threshold tracking.

Blue J
12x Revenue via CPA.com Partnership
$60K+
UltraTax Annual Cost
50
States, Each With Different Rules

The Killer Gap: Research tools sell per-seat. Preparation is still manual. Nobody sells completed returns. The outcome-based pricing model is wide open.

04
SOLUTION

AI Agents That Prepare Returns

Multi-agent system: reads documents, applies firm's tax strategy, enters data into systems. What takes 4 hours becomes 15 minutes of review. Every citation verifiable. Human signs, AI does the work.

Document Agent
Extracts from K-1s, statements, invoices
Research Agent
IRC citations, state rules, confidence scores
Prep Agent
Drafts returns, flags items for review
SALT Agent
Multi-state nexus, apportionment, thresholds

Why Now: GPT-4 enabled Blue J's 12x growth. Filed claims 30-50% review cycle reduction. Avalara building "agentic tax" for transaction compliance. The capability inflection is here.

05
GTM

Start Mid-Market, Pay Per Return

6-50 preparer firms: fast decisions, acute talent pain, can't build in-house. Outcome-based pricing—firms pay for completed returns, not software seats. CPA society partnerships for distribution.

6-50
Preparer Sweet Spot
Per-Return
Pricing (vs $1,500/Seat)
400K+
AICPA Members (Distribution)
Weeks
Not Months to Close

Expansion Path: Mid-market firms → State CPA society endorsements → Enterprise (Top 100) → Big Four white-label. Basis already has 30% of Top 25 with enterprise-first approach.

06
ASK

Autopilot for Tax Preparation

Blue J sells research tools to accountants. We sell completed tax returns to firms. Outcome-based pricing aligned with Sequoia's "sell the work" thesis. The demographic crisis is now—we're the solution.

70%
Prep Time Reduction Target
7d → 1d
Turnaround (Filed Benchmark)
30-50%
Faster Review Cycles
12x
Blue J Revenue Growth (Comp)

One-Liner: "AI agents that prepare tax returns from scratch—firms pay per return, not per seat. We automate the 80% of tax work that's pure intelligence, so the retiring 75% of CPAs don't take the industry with them."