AI Inference Costs Are Crushing SaaS Gross Margins — Here's What to Do About It
21 April 2026

AI Inference Costs Are Crushing SaaS Gross Margins — Here's What to Do About It

SaaS Metrics School

About

Is your AI SaaS company skating on thin ice because of exploding compute costs you're not tracking?


In episode #365, Ben Murray tackles one of the most pressing financial challenges facing AI-first SaaS companies: the structural margin compression caused by LLM inference costs. Traditional SaaS was built on near-zero marginal cost per customer — that era is over. If you're building on top of AI, every prompt, query, and agentic workflow is a hard COGS line that scales with revenue, and if you're not managing it, it will quietly destroy your unit economics.



    Why AI-first SaaS companies are running 50–60% gross margins (vs. 70–80% for legacy SaaS) — and what Bessemer data shows about AI supernovas with margins as low as 25%.
    How inference and compute costs differ fundamentally from traditional SaaS COGS — and why they won't scale down the way hosting costs did
    Why token costs vary wildly (from $1–2 per million to $30–180+ for frontier models) and how that variability makes feature-level economics a CFO priority
    5 tactical ways to reduce LLM spend: model routing, prompt caching, context compaction, semantic caching, and batch processing
    How to set up your GL accounts and COGS tracking to allocate inference costs by feature — so you actually understand the economics of what you've built

Tune in before your next board meeting — because if you're not tracking AI inference costs at the feature level, you're flying blind on your most important unit economics.


Resources Mentioned



    The SaaS CFO: https://www.thesaascfo.com/
    Ray Rike — AI to ROI Newsletter: https://ai2roi.substack.com/
    Tomas Tunguz: https://tomtunguz.com/
    Fungies.io — 5 Ways to Save on LLM Costs: https://fungies.io