When does off-the-shelf clearly win?

For generic workflows where your data is not the differentiator: meeting summaries, code completion, image generation, generic copywriting, document OCR, calendar coordination. The SaaS vendor has more training data than you ever will, the work itself is the same across buyers, and the cost at low-to-medium volume is unbeatable. Going custom on these workflows is almost always a category error.

How do we know if we are over the cost crossover?

Add up your monthly bill for the vendor, project it twelve months forward at your current growth rate, and compare it to the engineering cost of building the equivalent workflow plus the inference cost at your volume. The crossover usually shows up between 10,000 and 100,000 calls per month, but it varies by tool. If the projected twelve-month bill exceeds two to three months of senior engineering time plus inference, you are over.

Can we start with off-the-shelf and migrate to custom later?

Yes, and it is often the right pattern. The condition is that you preserve the data and the eval baseline from the off-the-shelf phase. Those are what make the migration cheap. Workflows that locked themselves into a vendor's proprietary tagging or scoring rubric end up rebuilding from scratch, which is more expensive than starting custom in the first place. Plan the exit on day one, even if you never use it.

What about hybrid: off-the-shelf model, custom workflow around it?

That is the default architecture for most custom AI workflows in 2026. You almost never want to host the model yourself unless data residency demands it. What you want to own is the prompt, the retrieval layer, the eval set, the guardrails, and the orchestration, calling out to a hosted model as a commodity. This pattern gives you the cost and quality of frontier models with the integration and reversibility of a custom workflow.

How long does a custom AI workflow take to ship?

Six to twelve weeks for a focused workflow with a defined input, output, and success metric. Faster is possible and usually means the eval got skipped. Slower usually means scope crept. The original workflow split into three and is being built as a platform. We sequence one workflow at a time, ship it, measure it for a month, then start the next one.

Custom AI workflows vs off-the-shelf tools: when each one wins

The premise

A custom AI workflow is a tool built specifically for one company's process, data, and constraints, versus an off-the-shelf product that the same company shares with thousands of others. The build-or-buy decision for AI is not the same as for software, because what AI products sell is no longer the model. It is the integration, the eval, and the operating posture around the model.

Founders who treated AI like SaaS in 2023 paid the SaaS price and got a SaaS tool. That is fine when the use case is generic. It stops being fine when the workflow is core to the business, when the data inside it is the company's actual moat, or when the vendor's roadmap and the company's roadmap start pointing in different directions. At that point, the question is which workflows to keep on a vendor and which to bring in-house, and the answer depends on five things, not on price.

This piece walks through the five questions, in order, with the failure modes we have seen on both sides. The aim is not to nudge you toward custom. It is to give you the framework that says, for this specific workflow, off-the-shelf is the right answer, or it is not.

How we build

From idea to production

The way SDEN turns an idea like this into a system you can run.

The first question

Is the workflow generic, or is it the company's process?

The single most important question is whether the AI is doing your work, or someone else's work.

A workflow is generic when the inputs, the steps, and the outputs look the same across every company in your category. Drafting a sales email from a template. Summarizing a customer call. Transcribing a meeting. Generating a thumbnail. The reason these are well-served by off-the-shelf products is structural: the SaaS vendor has more training data than any single buyer ever will, and the work itself is the same across buyers, so a shared model converges on a better answer than any private one.

A workflow is the company's process when the inputs, the steps, or the outputs depend on data the company owns and on rules the company defines. Property valuation, given how this specific agency tags its listings. Lead scoring, against this company's actual conversion data. Loan underwriting, against this lender's risk model. Inventory rebalancing, against this retailer's actual delivery network. These workflows do not converge well across companies, because the value of being correct depends on private context that the SaaS vendor does not have access to and would not benefit from optimizing for.

If you cannot answer this question cleanly, the workflow is somewhere in the middle, and the right call is usually to start with off-the-shelf, measure it, and revisit once you have data on where it falls short.

Fig. · Is the workflow generic, or is it the company's process?

The four other questions

Data, lock-in, eval ownership, cost asymmetry

Second question: data sensitivity. If the workflow inputs include customer PII, financial data, health data, or contractually restricted information, the bar for off-the-shelf goes up. Enterprise tiers can solve most of this; sometimes the data simply cannot leave the network at all, in which case the question collapses to: build, or do not deploy AI to this workflow. Third question: lock-in. How portable is the workflow if you decide to leave the vendor? Workflows built around a vendor's proprietary tagging, classification taxonomy, or knowledge graph have hidden switching costs that compound over time.

Fourth question: eval ownership. Can you evaluate the workflow's output against a baseline you defined, on a dataset you own, at a cadence you control? Off-the-shelf products that do not give you eval access are betting that you will never notice quality drift. Sometimes that is fine: for low-criticality workflows it is fine. For anything load-bearing, it is the most common failure mode we see: the team trusts the tool for two quarters, then one quarter it quietly stops working as well, and by the time anyone notices, the customer has noticed first.

Fifth question: cost asymmetry. Off-the-shelf is usually cheaper at low volume and more expensive at scale. The crossover point varies by tool: sometimes it lands at 10,000 calls per month, sometimes at 1,000,000. Custom workflows have a higher up-front cost and a lower per-call cost, so the question becomes how confident you are about volume two years out. If volume is uncertain, off-the-shelf preserves optionality. If volume is committed and growing, custom is the cheaper position by month nine in our experience.

Fig. · Data, lock-in, eval ownership, cost asymmetry

When custom is the right call

Three patterns where build wins clearly

The first pattern: workflows where the company's proprietary data is the differentiator and the model has to integrate it tightly. Real Estate's valuation model would not work as a SaaS tool; it depends on the agency's tagging conventions, its market history, and its agents' explainability needs. We built it because no off-the-shelf product could integrate those three things without compromising one of them.

The second pattern: workflows where the eval matters more than the model. For Lead Manager's lead-scoring layer, the model is interchangeable. What is not interchangeable is the labeled dataset of past conversions and the ability to retrain against it monthly. Off-the-shelf scoring tools work, but they score against their own definition of a good lead, not against the specific definition that maps to this business's revenue.

The third pattern: workflows where the cost crossover has clearly happened and the volume is growing. A support-classification workflow at 50,000 tickets per month is cheaper to run on a custom RAG pipeline than on any vendor agent we have priced, and the gap grows as volume grows. The build cost is recovered in 6 to 12 months; after that, the custom workflow is pure savings.

Fig. · Three patterns where build wins clearly

How SDEN runs the decision

Three commitments on every build-or-buy engagement

We do not have a preferred answer. The framework decides, and we ship whichever side wins for the specific workflow.

Framework before recommendation

We score every candidate workflow against the five questions before we recommend a side. The framework is shared with the client and stays in the audit; the next workflow gets scored the same way.

Hybrid is a valid answer

Most companies should run a mix. Generic workflows stay off-the-shelf; differentiated workflows go custom. The portfolio decision matters more than any single workflow decision.

Reversibility on both sides

Custom workflows ship with an exit plan to off-the-shelf if the framework changes. Off-the-shelf workflows ship with the data export and the eval baseline that make a future custom rebuild possible. Both sides stay open.

What good looks like

An AI portfolio that earns its position

A year later, every AI workflow in the business has a deliberate reason to be off-the-shelf or custom, and the team can defend each one.

The honest test of the build-or-buy framework is whether the answers hold up under change. New vendor lands with a better off-the-shelf option: does the framework say to switch, and does the company actually switch? Custom workflow's volume drops below the crossover: does the team have the discipline to consider rolling it back to off-the-shelf? Most companies do neither, because the decisions are not framework-driven; they are inertia-driven.

The companies that get this right run the framework as a recurring review, not as a one-time audit. Twice a year, every workflow gets re-scored on the five questions. The output is not always change; usually it is confirmation. But the discipline of running the review is what keeps the portfolio honest.

The wider outcome is that AI stops being a yes/no question and becomes a portfolio decision, which is the right framing for a category of software that is going to keep changing faster than the rest of the stack for the foreseeable future.