What is a realistic AI ROI percentage to expect?

For well-targeted workflows with disciplined measurement: 15 to 35% improvement on the primary metric (time, cost, or throughput) within six months. Higher is possible for specific workflows (automated lead scoring, document drafting, support classification) but the durable, measurable wins concentrate in that range. Be skeptical of any vendor case study above 60% that does not show the baseline methodology.

How do we measure ROI when the AI feature is qualitative?

Decompose the qualitative claim into a measurable proxy. 'Better customer experience' becomes CSAT, response time, and resolution rate. 'Higher-quality leads' becomes conversion-to-meeting, meeting-to-opportunity, and opportunity-to-close rates against a labeled baseline. If the claim cannot be decomposed into a proxy, the feature is not yet ready to be deployed against an ROI argument. It is an exploration, and should be labeled as one.

When should we expect to see ROI from an AI deployment?

Month one shows direction. Month three shows whether the early signal holds. Month twelve shows whether the feature ages well. Most deployments that fail do so between month four and month nine. The launch enthusiasm fades, drift starts, and nobody is measuring closely. If your monthly review cadence is set up to catch that window, you usually save the feature.

What do we do when the AI ROI is negative?

Roll it back. We have rolled features back at SDEN. It is not common, and it is not a failure; it is the discipline working. The alternative is letting an underperforming feature consume support burden and erode team trust in the AI portfolio more broadly. A clean rollback with documented learnings funds the next attempt; a quiet underperformance does not.

How much does ROI measurement cost relative to the deployment?

Five to ten percent of the engagement, typically. Baseline capture is a week. The eval harness and dashboard are built once and amortized across future deployments. Monthly review is half a day for a senior engineer. The cost is small; the alternative, deploying features you cannot govern, is more expensive than the measurement.

AI ROI for founders: measuring what AI is actually worth

The premise

AI return on investment is the measurable change in business outcomes (time per case, cost per case, throughput, conversion rate, or quality) that can be attributed to an AI deployment, net of the cost of building and running it. The number is defensible when there is a baseline, a measurement cadence, and an explicit attribution model. Without those three, it is a story.

Most AI ROI numbers we see in board decks are stories. The pattern is consistent: the team picks the metric that moved, attributes the entire delta to the AI feature, ignores the seasonal and product-mix effects, and reports a percentage that is large enough to justify the next investment. The conversation then moves on. Three quarters later, when the next AI investment also needs to be justified, the original feature's actual impact has quietly stopped being measured.

This piece is the framework SDEN uses to make AI ROI measurable. The four metrics that count, the baseline discipline that makes them defensible, the attribution failure modes that quietly destroy the case, and what good looks like at month one, month three, and month twelve.

How we build

From idea to production

The way SDEN turns an idea like this into a system you can run.

The baseline discipline

If you do not measure before, you cannot measure after

The single biggest reason AI ROI numbers are not defensible is that nobody captured the before.

An AI deployment without a documented pre-deployment baseline is not measurable. It does not matter how sophisticated the post-deployment dashboards are. Without a number from before, every comparison is to a remembered impression of how slow or expensive the old process was, and human memory of operational metrics is unreliable. We have audited deployments where the team was certain the AI feature saved 40% on time-per-case; the actual number, against the recovered baseline, was 12%. We have also seen the reverse: a team that felt the AI feature was disappointing, while the recovered baseline showed a real 25% improvement that nobody had credited because the new process felt the same.

The baseline is not difficult to capture. For most operational workflows, it is four measurements: time per case (median and p95), cost per case (fully loaded with human time), throughput (cases handled per person per week), and quality (a sampled audit of correctness, usually 30 to 50 cases). It takes a week, sometimes two if the data is scattered across tools, and it is the single highest-leverage step in any AI engagement.

We refuse to ship an AI feature without a captured baseline. Not because we want to look good. Because without it, the feature has no governance path. Nobody can roll it back when it stops working, because nobody can prove it ever started working.

Fig. · If you do not measure before, you cannot measure after

The four metrics that count

Time, cost, throughput, quality, and the trap of the fifth

AI deployments move four metrics. Time per case is the most visible: how long does it take to handle one instance of the workflow, end to end. Cost per case is the fully loaded version: time per case multiplied by the cost of the people doing it, plus the cost of the AI itself. Throughput is the team-level view: how many cases does the team handle in a week, holding headcount constant. Quality is the discipline against optimization theatre: are the cases handled correctly, sampled against the same audit as before.

Most teams report on one of these and call it ROI. The honest version reports on all four, because optimizing one without the others is usually how AI deployments quietly fail. The classic pattern: the AI feature cuts time per case by 50%, the team handles 80% more cases per week, leadership reports a productivity win. Six months later, the quality audit shows that error rates have doubled; the team rushed, the model missed edge cases, and the cost of the errors landed downstream as customer churn or refund obligations. The actual ROI was negative; nobody measured it.

The fifth metric (the trap) is 'team satisfaction' or 'time saved' as reported in a survey. These are useful signals; they are not ROI metrics. People consistently overestimate the time AI tools save them, by factors of two to three in studies we trust. Use survey data for product feedback. Do not use it to justify the next AI investment.

Fig. · Time, cost, throughput, quality, and the trap of the fifth

Attribution failure modes

Three ways the ROI number lies

The first failure mode is unattributed concurrent changes. The AI feature shipped in the same quarter as a UX redesign, a new training program, and a market-mix shift. The metric moved; the AI feature gets credit for the whole delta. The corrective is a holdout group, an A/B, or at minimum an explicit list of concurrent changes documented in the ROI memo. We default to a small holdout group on every deployment unless the workflow makes it impossible.

The second failure mode is the seasonality glitch. The baseline was captured in a quiet quarter; the post-deployment measurement is from a peak quarter. The improvement looks real and is partly seasonal. Corrective: compare year-over-year if the cycle is annual, or use a rolling four-week baseline that controls for short-term variance.

The third failure mode is the silent quality drift. The model performs well at launch, performance erodes slowly over six months, nobody resets the baseline, and the reported ROI keeps using the launch-quarter quality number. The deployment looks healthy on the dashboard while customers are noticing the degradation. Corrective: quality is measured at the same cadence as cost and time, and the dashboard surfaces drift explicitly.

How SDEN runs ROI

Three commitments on AI measurement

We do not ship an AI feature without all three. They are the bar for the engagement, not optional add-ons.

Baseline captured before launch

We do not ship until time, cost, throughput, and quality are measured for the pre-AI process. Without the four numbers, the deployment cannot be governed afterwards.

Holdout or documented confounders

Default to a holdout group. When that is not possible, the ROI memo names every concurrent change in the same quarter, and the attribution model that handles them.

Monthly review, twelve-month horizon

The same four metrics are reviewed monthly, dashboarded continuously, and re-baselined annually. The honest test is whether the feature is still working at month twelve, not at month one.

What good looks like

An AI portfolio with defensible numbers

Twelve months in, the leadership team can defend every AI investment with a number that survives a board challenge.

The companies that get AI ROI right are not the ones with the biggest numbers. They are the ones whose numbers survive scrutiny. The CFO can trace each percentage point of impact to a measurement methodology. The CEO can explain in a board meeting which AI investments worked and which did not, and what the company learned from the failures. The head of engineering can roll back an AI feature when the numbers stop moving, and has actually done so, at least once, without political cost.

The wider effect is that AI stops being a special category of investment and becomes a normal one. New use case lands with a baseline, ships with an eval, gets reviewed monthly, and is killed when it stops paying back. The discipline that the company applies to ad spend or pricing experiments, applied to AI. That is what a mature AI portfolio looks like.

The numbers are also smaller. Companies with rigorous ROI measurement report 15 to 35% improvements on the workflows they target, not the 200 to 400% improvements that show up in vendor case studies. The smaller numbers are the real ones, they compound across the portfolio, and they survive the audit.

Fig. · An AI portfolio with defensible numbers

FAQ

AI for founders
questions we get asked.

Direct answers to the questions we get asked the most. If yours isn't covered, write to the team.

Contact the team

AI ROI for founders: measuring what AI is actually worth

From idea to production

If you do not measure before, you cannot measure after

Time, cost, throughput, quality, and the trap of the fifth

Three ways the ROI number lies