The premise
A custom AI workflow is a tool built specifically for one company's process, data, and constraints, versus an off-the-shelf product that the same company shares with thousands of others. The build-or-buy decision for AI is not the same as for software, because what AI products sell is no longer the model. It is the integration, the eval, and the operating posture around the model.
Founders who treated AI like SaaS in 2023 paid the SaaS price and got a SaaS tool. That is fine when the use case is generic. It stops being fine when the workflow is core to the business, when the data inside it is the company's actual moat, or when the vendor's roadmap and the company's roadmap start pointing in different directions. At that point, the question is which workflows to keep on a vendor and which to bring in-house, and the answer depends on five things, not on price.
This piece walks through the five questions, in order, with the failure modes we have seen on both sides. The aim is not to nudge you toward custom. It is to give you the framework that says, for this specific workflow, whether off-the-shelf is the right answer or not.
Is the workflow generic, or is it the company's process?
The single most important question is whether the AI is doing your work, or someone else's work.
A workflow is generic when the inputs, the steps, and the outputs look the same across every company in your category. Drafting a sales email from a template. Summarizing a customer call. Transcribing a meeting. Generating a thumbnail. The reason these are well-served by off-the-shelf products is structural: the SaaS vendor has more training data than any single buyer ever will, and the work itself is the same across buyers, so a shared model converges on a better answer than any private one.
A workflow is the company's process when the inputs, the steps, or the outputs depend on data the company owns and on rules the company defines. Property valuation, given how this specific agency tags its listings. Lead scoring, against this company's actual conversion data. Loan underwriting, against this lender's risk model. Inventory rebalancing, against this retailer's actual delivery network. These workflows do not converge well across companies, because the value of being correct depends on private context that the SaaS vendor does not have access to and would not benefit from optimizing for.
If you cannot answer this question cleanly, the workflow is somewhere in the middle, and the right call is usually to start with off-the-shelf, measure it, and revisit once you have data on where it falls short.
Data, lock-in, eval ownership, cost asymmetry
Second question: data sensitivity. If the workflow inputs include customer PII, financial data, health data, or contractually restricted information, the bar for off-the-shelf goes up. Enterprise tiers can solve most of this; sometimes the data simply cannot leave the network at all, in which case the question collapses to: build, or do not deploy AI to this workflow. Third question: lock-in. How portable is the workflow if you decide to leave the vendor? Workflows built around a vendor's proprietary tagging, classification taxonomy, or knowledge graph have hidden switching costs that compound over time.
Fourth question: eval ownership. Can you evaluate the workflow's output against a baseline you defined, on a dataset you own, at a cadence you control? Off-the-shelf products that do not give you eval access are betting that you will never notice quality drift. Sometimes that is fine; for low-criticality workflows it is fine. For anything load-bearing, it is the most common failure mode we see: the team trusts the tool for two quarters, then one quarter it quietly stops working as well, and by the time anyone notices, the customer has noticed first.
Fifth question: cost asymmetry. Off-the-shelf is usually cheaper at low volume and more expensive at scale. The crossover point varies by tool: sometimes it lands at 10,000 calls per month, sometimes at 1,000,000. Custom workflows have a higher up-front cost and a lower per-call cost, so the question becomes how confident you are about volume two years out. If volume is uncertain, off-the-shelf preserves optionality. If volume is committed and growing, custom is the cheaper position by month nine in our experience.
Three patterns where build wins clearly
The first pattern: workflows where the company's proprietary data is the differentiator and the model has to integrate it tightly. Real Estate's valuation model would not work as a SaaS tool: it depends on the agency's tagging conventions, its market history, and its agents' explainability needs. We built it because no off-the-shelf product could integrate those three things without compromising one of them.
The second pattern: workflows where the eval matters more than the model. For Lead Manager's lead-scoring layer, the model is interchangeable. What is not interchangeable is the labeled dataset of past conversions and the ability to retrain against it monthly. Off-the-shelf scoring tools work, but they score against their own definition of a good lead, not against the specific definition that maps to this business's revenue.
The third pattern: workflows where the cost crossover has clearly happened and the volume is growing. A support-classification workflow at 50,000 tickets per month is cheaper to run on a custom RAG pipeline than on any vendor agent we have priced, and the gap grows as volume grows. The build cost is recovered in 6 to 12 months; after that, the custom workflow is pure savings.
What the build-or-buy decision actually changes
Four representative shifts we have seen inside operating businesses when the build-or-buy call gets made deliberately, with the framework above.
A sales team uses three different AI tools that all touch the lead record: enrichment, scoring, and outbound sequencing. None of them share data; the rep sees three different qualification signals that contradict each other.
One custom workflow consolidates enrichment, scoring, and sequencing against a single typed lead model. The vendor tools are deprecated. Reps work one queue, scored against this business's actual conversion data, not a generic scoring rubric. Pattern matches Lead Manager at ×2.4 more meetings per hour of prospecting.
Takeaway · Three vendors that do not talk to each other lose to one workflow that does, at the price of building it once.
A real-estate agency uses a SaaS valuation tool that returns a single number with no explanation. Agents do not trust it; they manually re-do the valuation half the time anyway.
A custom valuation model returns an explainable range, with the comparables that informed it, integrated into the listing workflow. Agents adjust on the margins. Pattern from Real Estate: 70% less time per valuation, and the agent can defend the number to the seller.
Takeaway · Explainability is a hard constraint that off-the-shelf tools rarely meet, and rarely advertise as a gap.
A company pays for a vendor copilot that processes meeting transcripts through a shared model. The privacy review reveals customer names, deal values, and pricing leaving the network.
The workflow is rebuilt internally on an open-weight model running in the company's own cloud, against the same use case. The buyer keeps the productivity, drops the data leak.
Takeaway · Sometimes the answer is custom because off-the-shelf cannot meet a non-negotiable constraint, not because custom is cheaper.
A support team uses an off-the-shelf ticket classifier at $0.03 per classification. Volume grows 40% in nine months; the bill follows.
A custom RAG-based classifier runs at $0.004 per classification on the same hardware footprint. Build cost is recovered in eight months; year two is pure margin.
Takeaway · At scale, custom is usually cheaper. The math is unsentimental, and worth running before signing the renewal.
Three commitments on every build-or-buy engagement
We do not have a preferred answer. The framework decides, and we ship whichever side wins for the specific workflow.
Framework before recommendation
We score every candidate workflow against the five questions before we recommend a side. The framework is shared with the client and stays in the audit; the next workflow gets scored the same way.
Hybrid is a valid answer
Most companies should run a mix. Generic workflows stay off-the-shelf; differentiated workflows go custom. The portfolio decision matters more than any single workflow decision.
Reversibility on both sides
Custom workflows ship with an exit plan to off-the-shelf if the framework changes. Off-the-shelf workflows ship with the data export and the eval baseline that make a future custom rebuild possible. Both sides stay open.
An AI portfolio that earns its position
A year later, every AI workflow in the business has a deliberate reason to be off-the-shelf or custom, and the team can defend each one.
The honest test of the build-or-buy framework is whether the answers hold up under change. A new vendor lands with a better off-the-shelf option: does the framework say to switch, and does the company actually switch? A custom workflow's volume drops below the crossover: does the team have the discipline to consider rolling it back to off-the-shelf? Most companies do neither, because the decisions are not framework-driven; they are inertia-driven.
The companies that get this right run the framework as a recurring review, not as a one-time audit. Twice a year, every workflow gets re-scored on the five questions. The output is not always change; usually it is confirmation. But the discipline of running the review is what keeps the portfolio honest.
The wider outcome is that AI stops being a yes/no question and becomes a portfolio decision, which is the right framing for a category of software that is going to keep changing faster than the rest of the stack for the foreseeable future.
AI for founders:
questions we get asked.
Direct answers to the questions we get asked the most. If yours isn't covered, write to the team.