Showing Posts From

Ai roi for cfo

What the CFO Needs to Understand About AI Investment (That the Vendor Won't Tell Them)

What the CFO Needs to Understand About AI Investment (That the Vendor Won't Tell Them)

The deck looks great. There's a 3x ROI at month 12, a cost-per-decision metric your competitors would envy, and case studies from companies that look just like yours. The vendor has done this pitch a hundred times. They know what a CFO wants to see. The problem is the ROI model they're using was designed for software. And AI isn't software. That distinction sounds pedantic until you're twelve months in and wondering why the numbers don't match the deck. The gap between AI investment promises and P&L reality is probably the most expensive misalignment in enterprise technology right now. Not because AI doesn't deliver value — it does, for the right use cases, in the right organizations, under specific conditions. But because the financial model used to justify it was built for a different kind of purchase. Software procurement ROI runs on three assumptions: costs are predictable, value delivery is linear, and the failure mode is a delayed project. None of those hold for AI. Why the software ROI model breaks on AI Software has a cost structure that finance teams can work with. Licensing is known, implementation is estimated, ongoing support is a percentage of the license. The model is imperfect but manageable. AI cost structure doesn't map to any of those buckets cleanly. The largest cost variable in most enterprise AI programs isn't the AI itself. It's data. Before a model can be trained on anything useful, someone has to assess what data you actually have — which is usually different from what the business thinks it has — fix the quality problems, integrate sources that weren't built to talk to each other, and set up the governance to make sure the training data is legally usable. That work is slow, expensive, and almost never appears on a vendor proposal. It also doesn't end: data quality degrades, systems change, and each new use case adds new requirements. Compute is the second piece that gets undercounted. Training costs and inference costs are different things, and vendor estimates typically focus on training. Inference is what you pay for in production — every time the model scores a new input. For high-volume use cases like fraud detection, real-time pricing, or recommendation, inference costs at scale regularly exceed what the organization paid to train the model. Cloud pricing makes this easy to miss until the bills start arriving. Then there's talent. AI teams don't price like enterprise software teams. Data scientists, ML engineers, and MLOps specialists have their own market rates, and those rates aren't decreasing. More importantly, the team that builds a model is different from the team that runs it in production. Both need to be funded and sustained for as long as the model is in use. The last piece is governance and monitoring. Every production model needs drift detection, performance tracking, audit logging, and a scheduled retraining cadence. This is unglamorous, recurring spend that consistently goes missing from initial program budgets. A model without monitoring isn't a production model. It's a liability on a timeline. The time-to-value curve vendors don't show you Vendor decks show value beginning to accumulate somewhere around month six. The actual pattern is different enough to change how you fund the program. The first three months are almost entirely cost. Data assessment, infrastructure setup, hiring or contracting the team, use case definition. Nothing deployable. Months four through nine are where the model gets built and tested. Results exist but aren't trusted enough to act on. This is when programs are most at risk of being canceled — the spending is real, the returns aren't visible yet, and the business is getting impatient. Months ten through eighteen are shadow deployment and validation. The model scores live data. Outcomes get compared against what actually happened. Trust builds incrementally, or it doesn't build at all. Past eighteen months is typically where the value curve starts moving in a way that looks anything like the deck. And it does compound — more production data, a team that understands the operational patterns, a process that's been rebuilt around the AI output. The economics improve over time. But only if the program survives long enough to get there. If the board expects visible returns at month twelve and the program is in month nine with real costs and nothing to show yet, someone will pull funding. The time-to-value curve needs to be part of the approval conversation, not something the program team manages quietly while hoping performance picks up. The opex trap Most boards think of technology investment as capex: a project spend that produces an asset and then stops. AI programs don't work that way. The ongoing costs are material — compute that scales with usage, continuous data quality maintenance, model monitoring, and retraining when performance drifts. An organization that funds AI as a project will hit a wall when the project budget closes and someone realizes the model needs sustained investment to stay useful. This also changes the unit economics conversation. The question isn't just "what does it cost to build this?" It's "what does it cost to run this for three years?" Those are different numbers, and the second one is the one that matters for the actual investment decision. Red flags in vendor ROI models Two things in AI vendor decks deserve specific scrutiny. FTE displacement is the most commonly inflated line item. Many ROI models show cost savings by treating automated tasks as direct headcount reductions. In practice, organizations rarely convert FTE displacement into hard savings. People get redeployed to other work, absorbed into open roles, or kept on to manage the exception cases the model can't handle. The productivity gain is real — the cost reduction usually isn't, unless the organization explicitly plans a workforce reduction. A vendor model that treats FTE displacement as a direct cost saving is overstating the ROI. Efficiency gains disconnected from business outcomes are the other pattern. "Your team handles the same volume 30% faster" is a productivity improvement. It becomes a financial outcome only if the freed capacity generates revenue or the cost base actually decreases. Efficiency claims need to be connected to a specific result, not left as an assumption that value will follow. And case studies drawn from other companies at different scales in different industries are useful for direction only. The right ROI model uses your baseline, your data quality, your integration complexity, and your team's capability. A vendor can't assess any of those from a discovery call. What to actually track Total ROI — value divided by investment — tells you the aggregate return after the fact. It doesn't tell you whether a program in flight is working. The metrics that do: Model performance against baseline matters first. Is the model improving, and is that improvement translating to better decisions? The baseline needs to be set before the program starts — what was the business doing before the model existed? Without a documented baseline, there's nothing to measure against. Production adoption rate tells you whether the business integration is actually working. A model that produces output nobody acts on isn't delivering value regardless of how well it scores in testing. What percentage of model outputs are actually consumed by a decision-making process? Cost per decision at volume should decrease as throughput scales. If it isn't, the infrastructure design or use case economics have a problem worth investigating. Retraining cost trend should improve as models mature. If the cost and time to retrain keeps rising, the data architecture has a compounding problem that will only get worse. The success definition that usually goes missing AI programs get approved with vague success criteria because specificity feels like it creates accountability before the team has figured out what's achievable. That logic runs backward. Vague criteria are what allow programs to run for eighteen months without anyone agreeing on whether they're working. A complete success definition has four components: a specific metric, a documented baseline, a numeric target, and a date. "Improve fraud detection" is not a success definition. "Reduce the false negative rate from 4.2% to below 2.5% by Q3 of next fiscal year" is. The CFO's job is not to slow the program down by demanding this. It's to make the investment defensible when someone asks whether it's working. And in every organization I've seen do this at scale, someone eventually asks.

Read full article