
AI Inference Costs Are Crushing SaaS Gross Margins — Here's What to Do About It
SaaS Metrics School
Is your AI SaaS company skating on thin ice because of exploding compute costs you're not tracking?
In episode #365, Ben Murray tackles one of the most pressing financial challenges facing AI-first SaaS companies: the structural margin compression caused by LLM inference costs. Traditional SaaS was built on near-zero marginal cost per customer — that era is over. If you're building on top of AI, every prompt, query, and agentic workflow is a hard COGS line that scales with revenue, and if you're not managing it, it will quietly destroy your unit economics.
Why AI-first SaaS companies are running 50–60% gross margins (vs. 70–80% for legacy SaaS) — and what Bessemer data shows about AI supernovas with margins as low as 25%.
How inference and compute costs differ fundamentally from traditional SaaS COGS — and why they won't scale down the way hosting costs did
Why token costs vary wildly (from $1–2 per million to $30–180+ for frontier models) and how that variability makes feature-level economics a CFO priority
5 tactical ways to reduce LLM spend: model routing, prompt caching, context compaction, semantic caching, and batch processing
How to set up your GL accounts and COGS tracking to allocate inference costs by feature — so you actually understand the economics of what you've built
Tune in before your next board meeting — because if you're not tracking AI inference costs at the feature level, you're flying blind on your most important unit economics.
Resources Mentioned
The SaaS CFO: https://www.thesaascfo.com/
Ray Rike — AI to ROI Newsletter: https://ai2roi.substack.com/
Tomas Tunguz: https://tomtunguz.com/
Fungies.io — 5 Ways to Save on LLM Costs: https://fungies.io