- 12 Jun, 2026
What the CFO Needs to Understand About AI Investment (That the Vendor Won't Tell Them)
The deck looks great. There's a 3x ROI at month 12, a cost-per-decision metric your competitors would envy, and case studies from companies that look just like yours. The vendor has done this pitch a hundred times. They know what a CFO wants to see. The problem is the ROI model they're using was designed for software. And AI isn't software. That distinction sounds pedantic until you're twelve months in and wondering why the numbers don't match the deck. The gap between AI investment promises and P&L reality is probably the most expensive misalignment in enterprise technology right now. Not because AI doesn't deliver value — it does, for the right use cases, in the right organizations, under specific conditions. But because the financial model used to justify it was built for a different kind of purchase. Software procurement ROI runs on three assumptions: costs are predictable, value delivery is linear, and the failure mode is a delayed project. None of those hold for AI. Why the software ROI model breaks on AI Software has a cost structure that finance teams can work with. Licensing is known, implementation is estimated, ongoing support is a percentage of the license. The model is imperfect but manageable. AI cost structure doesn't map to any of those buckets cleanly. The largest cost variable in most enterprise AI programs isn't the AI itself. It's data. Before a model can be trained on anything useful, someone has to assess what data you actually have — which is usually different from what the business thinks it has — fix the quality problems, integrate sources that weren't built to talk to each other, and set up the governance to make sure the training data is legally usable. That work is slow, expensive, and almost never appears on a vendor proposal. It also doesn't end: data quality degrades, systems change, and each new use case adds new requirements. Compute is the second piece that gets undercounted. Training costs and inference costs are different things, and vendor estimates typically focus on training. Inference is what you pay for in production — every time the model scores a new input. For high-volume use cases like fraud detection, real-time pricing, or recommendation, inference costs at scale regularly exceed what the organization paid to train the model. Cloud pricing makes this easy to miss until the bills start arriving. Then there's talent. AI teams don't price like enterprise software teams. Data scientists, ML engineers, and MLOps specialists have their own market rates, and those rates aren't decreasing. More importantly, the team that builds a model is different from the team that runs it in production. Both need to be funded and sustained for as long as the model is in use. The last piece is governance and monitoring. Every production model needs drift detection, performance tracking, audit logging, and a scheduled retraining cadence. This is unglamorous, recurring spend that consistently goes missing from initial program budgets. A model without monitoring isn't a production model. It's a liability on a timeline. The time-to-value curve vendors don't show you Vendor decks show value beginning to accumulate somewhere around month six. The actual pattern is different enough to change how you fund the program. The first three months are almost entirely cost. Data assessment, infrastructure setup, hiring or contracting the team, use case definition. Nothing deployable. Months four through nine are where the model gets built and tested. Results exist but aren't trusted enough to act on. This is when programs are most at risk of being canceled — the spending is real, the returns aren't visible yet, and the business is getting impatient. Months ten through eighteen are shadow deployment and validation. The model scores live data. Outcomes get compared against what actually happened. Trust builds incrementally, or it doesn't build at all. Past eighteen months is typically where the value curve starts moving in a way that looks anything like the deck. And it does compound — more production data, a team that understands the operational patterns, a process that's been rebuilt around the AI output. The economics improve over time. But only if the program survives long enough to get there. If the board expects visible returns at month twelve and the program is in month nine with real costs and nothing to show yet, someone will pull funding. The time-to-value curve needs to be part of the approval conversation, not something the program team manages quietly while hoping performance picks up. The opex trap Most boards think of technology investment as capex: a project spend that produces an asset and then stops. AI programs don't work that way. The ongoing costs are material — compute that scales with usage, continuous data quality maintenance, model monitoring, and retraining when performance drifts. An organization that funds AI as a project will hit a wall when the project budget closes and someone realizes the model needs sustained investment to stay useful. This also changes the unit economics conversation. The question isn't just "what does it cost to build this?" It's "what does it cost to run this for three years?" Those are different numbers, and the second one is the one that matters for the actual investment decision. Red flags in vendor ROI models Two things in AI vendor decks deserve specific scrutiny. FTE displacement is the most commonly inflated line item. Many ROI models show cost savings by treating automated tasks as direct headcount reductions. In practice, organizations rarely convert FTE displacement into hard savings. People get redeployed to other work, absorbed into open roles, or kept on to manage the exception cases the model can't handle. The productivity gain is real — the cost reduction usually isn't, unless the organization explicitly plans a workforce reduction. A vendor model that treats FTE displacement as a direct cost saving is overstating the ROI. Efficiency gains disconnected from business outcomes are the other pattern. "Your team handles the same volume 30% faster" is a productivity improvement. It becomes a financial outcome only if the freed capacity generates revenue or the cost base actually decreases. Efficiency claims need to be connected to a specific result, not left as an assumption that value will follow. And case studies drawn from other companies at different scales in different industries are useful for direction only. The right ROI model uses your baseline, your data quality, your integration complexity, and your team's capability. A vendor can't assess any of those from a discovery call. What to actually track Total ROI — value divided by investment — tells you the aggregate return after the fact. It doesn't tell you whether a program in flight is working. The metrics that do: Model performance against baseline matters first. Is the model improving, and is that improvement translating to better decisions? The baseline needs to be set before the program starts — what was the business doing before the model existed? Without a documented baseline, there's nothing to measure against. Production adoption rate tells you whether the business integration is actually working. A model that produces output nobody acts on isn't delivering value regardless of how well it scores in testing. What percentage of model outputs are actually consumed by a decision-making process? Cost per decision at volume should decrease as throughput scales. If it isn't, the infrastructure design or use case economics have a problem worth investigating. Retraining cost trend should improve as models mature. If the cost and time to retrain keeps rising, the data architecture has a compounding problem that will only get worse. The success definition that usually goes missing AI programs get approved with vague success criteria because specificity feels like it creates accountability before the team has figured out what's achievable. That logic runs backward. Vague criteria are what allow programs to run for eighteen months without anyone agreeing on whether they're working. A complete success definition has four components: a specific metric, a documented baseline, a numeric target, and a date. "Improve fraud detection" is not a success definition. "Reduce the false negative rate from 4.2% to below 2.5% by Q3 of next fiscal year" is. The CFO's job is not to slow the program down by demanding this. It's to make the investment defensible when someone asks whether it's working. And in every organization I've seen do this at scale, someone eventually asks.
Read full article
- 11 Jun, 2026
The AI Talent Gap That Will Determine Whether Your Strategy Delivers
The most common response to an AI talent gap is a senior hire. A Chief AI Officer, a VP of AI, a Head of Machine Learning — someone with the credentials to lead the function and signal organizational commitment. The hire is often necessary. It is not sufficient. The limitation is not the seniority of the hire. It is the assumption that AI capability is a function of a few experts at the top of a structure, when in practice AI delivery requires distributed capability across data engineering, software engineering, product management, and business functions. An excellent senior AI leader working with teams that lack data engineering depth, or with product managers who cannot translate business requirements into AI-ready specifications, will not solve the talent problem. What follows is a more granular account of where the talent gaps actually sit and what decisions the CTO and CHRO need to make to address them. The data engineering gap Of all the talent gaps in enterprise AI programs, the data engineering gap is the most consistently underestimated and the most consequential for delivery. AI models need clean, accessible, well-structured data. Producing that data at the quality and scale AI requires is data engineering work. It involves building and maintaining pipelines, managing schema consistency, handling data quality monitoring, implementing the access control infrastructure that AI systems need, and enabling the data freshness requirements that production AI applications demand. Most enterprise data engineering teams were built for business intelligence and analytics workflows: batch processing, monthly reports, data warehouse queries. These are different from what AI requires. AI applications often need lower latency, higher reliability, more granular access control, and better lineage documentation than analytics workflows do. The CTO who wants to deliver AI at scale needs a data engineering function capable of supporting it. That often means hiring, upskilling, and in some cases restructuring what the data function does — not just adding ML engineers to an existing team. The ML engineering versus data science distinction Organizations that are building their first production AI applications sometimes conflate data science — the role of developing models and validating their performance — with ML engineering, the role of deploying those models into production systems reliably and at scale. Both are necessary. They are not the same skill profile, and the market for each is different. Data scientists are relatively abundant at the senior level, because most organizations that have been investing in analytics have developed or hired them. ML engineers — people who can build model serving infrastructure, implement monitoring for production model performance, manage model versioning and rollback, and integrate AI components into existing software systems — are significantly scarcer. The consequence: organizations that have data science capability but limited ML engineering capability can develop models in research environments that never make it to production, or that make it to production but degrade gracefully without anyone noticing because the monitoring infrastructure does not exist. If the AI program's goal is production systems rather than research artifacts, the CTO needs to assess the ML engineering capacity specifically, not just the overall AI headcount. The product management capability gap AI products require a different kind of product management than traditional software products. The core difference: the behavior of an AI system is probabilistic, not deterministic. It does not do the same thing every time with the same inputs. It produces outputs that vary, that can be wrong in ways that are hard to predict, and that require different quality evaluation approaches than traditional software. Product managers who are excellent at defining functional requirements for traditional software often struggle with AI products because the tools for specifying and evaluating probabilistic behavior are different. Writing specifications for what an AI system should do, designing evaluation frameworks for outputs that are not right or wrong but better or worse, and building product intuition for what good AI performance looks like in a given context are skills that most PMs have not developed. The CHRO and CTO need to assess whether the organization's product management function has the capability to manage AI products effectively, and build a development plan for those who do not. This is a training and coaching question as much as a hiring question — the capability gap can often be closed more quickly through targeted development of existing PMs than through hiring. Business function AI capability The talent discussion in AI programs is usually focused on the AI team. The talent that is often more limiting in practice is the capability in the business functions that the AI program is serving. A demand forecasting AI system that produces excellent outputs is only valuable if the operations function can use those outputs to make better planning decisions. An AI-assisted underwriting tool only improves outcomes if underwriters can evaluate AI recommendations critically. An AI customer segmentation system only drives revenue if the marketing function knows how to act on the segments it produces. The capability gap in business functions shows up as underutilization: the AI system is deployed, adoption is technically measurable, but the organization is not capturing the value because the business users do not have the skills or the process changes required to convert AI outputs into better decisions. This is a CHRO problem more than a CTO problem. The CHRO needs to assess capability requirements in the business functions where AI is being deployed, build development programs that address specific skill gaps, and — where necessary — reassess role profiles to reflect the new capability expectations. The retention problem Building AI capability is hard. Retaining it is harder. The market for AI talent is competitive in ways that most enterprise organizations are not structured to compete with. The retention challenge is not purely about compensation, though that is a factor. It is about the work itself. AI engineers and data scientists who join an enterprise to build production AI systems stay when the work is technically interesting, when there is access to good data, when the organization moves fast enough to keep them engaged, and when there is a credible path to increasing impact. Enterprise AI programs that are slow to deploy, that are constrained by data access issues, or that cannot move past proof-of-concept into production create retention problems independent of compensation. The best AI talent leaves not because they were offered more money elsewhere, but because the organizational conditions did not support the work they wanted to do. The CTO's response to the retention problem is not primarily about retention packages — it is about building the organizational conditions that make the work worth staying for. That means clearing data access blockers, moving programs through proof-of-concept to production on a credible timeline, and giving AI teams genuine ownership over delivery. What to take from thisThe data engineering capability gap is more consequential for AI delivery than the data science or ML gap in most enterprises. Assess it specifically and address it before the program depends on it. ML engineering and data science are different roles with different skill profiles and different market availability. Both are required for production AI systems. Product management capability for probabilistic systems needs to be developed explicitly. Most PMs do not have it and it does not develop naturally through exposure alone. Business function capability to use AI outputs is a limiting factor that the CHRO needs to address. AI underutilization is usually a business function capability problem, not an AI system problem. Retention of AI talent depends more on organizational conditions than compensation. The CTO's retention strategy is about clearing blockers and maintaining momentum, not primarily about pay.
Read full article
- 10 Jun, 2026
The Organizational Change Nobody Plans for When AI Goes Into Production
Technical AI programs plan for model performance, infrastructure reliability, and user adoption. The change management plans in most AI programs cover communication, training, and rollout support. These are necessary. They are not sufficient. What does not make it into the program plan is the organizational change that AI deployment actually creates: changes in who makes decisions, where accountability sits, and how existing roles need to adapt. These changes are not side effects of the technical program — they are the substance of what AI deployment means for how the organization operates. And they tend to surface six to twelve months after go-live, in the form of confusion about accountability, conflict between roles that now overlap, and resistance from functions that feel their judgment has been displaced. Addressing these changes proactively requires treating AI deployment as an organizational design question, not just a technology one. The decision rights problem AI systems are good at making or informing decisions that humans previously made alone. When an AI system produces a recommendation — in credit assessment, in demand forecasting, in HR screening, in customer prioritization — the human who used to make that decision now has a different role. They are either ratifying the AI's recommendation, overriding it, or working alongside it in a way that requires a new kind of judgment. This changes the nature of the role without changing the job title or the org chart. The credit analyst who used to run the full assessment is now running exception management. The demand planner who used to construct forecasts is now reviewing and adjusting AI-generated ones. The recruiter who used to screen applications is now reviewing a pre-filtered shortlist. These are not simpler jobs. In some respects they are harder — they require a different kind of expertise, specifically the ability to evaluate AI outputs critically rather than produce analysis independently. The people in these roles may have been selected and developed for the original capability profile, not the new one. Organizations that do not address this explicitly produce two failure modes. People who struggle with the new role either resist the AI system — finding reasons not to use it, overriding it more than the data warrants — or over-defer to it, approving AI recommendations without the critical review that the accountability structure requires. The accountability gap When a decision goes wrong in a human-only process, the accountability is reasonably clear. When a decision informed by an AI recommendation goes wrong, the accountability is murkier, and organizations that have not thought about it in advance tend to discover this in a difficult situation. Was the decision wrong because the AI recommendation was wrong? Or because the human failed to apply appropriate judgment to a valid AI recommendation? Or because the system was deployed in a context it was not designed for? Or because the training data did not reflect the conditions that produced this specific case? None of these questions have clean answers in an organization that has not set up the accountability structure deliberately. The result is attribution conflict: the AI team points to the human decision-maker, the business function points to the AI system, and nobody has clear accountability for remediation. Defining accountability for AI-informed decisions before deployment is one of the most important organizational design questions in any AI program. It requires a clear statement of where human judgment is required, what escalation looks like when the AI recommendation is overridden, and what the process is for determining whether a decision-quality problem is an AI problem or a human judgment problem. The capability shift The capability an AI system replaces does not simply disappear from the organization's needs — it transforms. The expertise required to interpret and challenge AI outputs is often closely related to the expertise required to produce the underlying analysis manually. But they are not the same skill, and the transition period between the two is where the most significant organizational risk sits. In the immediate period after AI deployment, the organization typically has people who are capable of doing the work manually but are still learning to use the AI tool effectively. This is manageable. The medium-term risk is more significant: if the organization stops developing the underlying manual capability because AI handles it, and the AI system underperforms or becomes unavailable, the recovery is slower than anyone anticipated. This is not an argument against AI deployment. It is an argument for being deliberate about which capabilities the organization maintains independently of the AI system and which it allows to atrophy as AI performance becomes reliable. The role boundary conflicts When AI tools augment work across function boundaries — a customer-facing AI system that pulls from data owned by multiple departments, an AI planning tool used by both finance and operations — the organizational boundaries that existed for human work do not automatically translate. Who decides what data the AI system uses? Who is accountable if the AI produces an output that reflects poorly on a specific function? Who decides when the AI recommendation should be overridden? When the AI generates recommendations that one function disagrees with, what is the escalation path? These are organizational design questions dressed as AI governance questions. They arise because AI systems do not respect the lines between functions in the same way that human roles do. The AI pulls on data from wherever it can reach and produces outputs that may reflect or implicate multiple functions simultaneously. The organizations that handle this well have addressed it before go-live: clear data ownership, a governance structure for the AI system that is recognized by all the functions it touches, and escalation paths that do not depend on functional boundaries that the AI system has already made ambiguous. The human-in-the-loop question Most AI programs that include human review in their design treat it as a quality control mechanism: the human checks the AI output before it is acted on. This framing is correct but incomplete. The human in the loop is also, and more importantly, the locus of accountability for the decision. If the human review is a checkbox rather than a substantive check, the accountability protection is illusory — the organization has the appearance of human oversight without the substance. Regulators, clients, and courts are unlikely to accept "a human reviewed it" as a sufficient defense for a poor decision if the review process was not meaningful. What meaningful human oversight looks like — how long it should take, what the reviewer is expected to assess, what training they need, what recourse they have when they disagree with the AI — needs to be specified in the organizational design of the AI system, not left to individual judgment. What to take from thisMap how AI deployment changes existing decision rights before go-live. The people whose roles are affected need to understand the new expectation before the system is live, not after they have defaulted to the wrong behavior. Define accountability for AI-informed decisions explicitly. The accountability structure needs to specify where human judgment is required, what override looks like, and how decision-quality problems are attributed and remediated. Assess the capability implications of AI deployment over a three-to-five year horizon. Identify which capabilities the organization intends to maintain independently and which it is comfortable allowing AI to own. Address function boundary conflicts before deployment. The organizational design questions created by AI systems that cross functional lines need governance structures that all the affected functions recognize. Human-in-the-loop design needs to specify what meaningful review looks like, not just that review occurs. Checkbox oversight creates accountability risk, not accountability protection.
Read full article
- 09 Jun, 2026
AI Governance for Boards: What to Own and What to Delegate
There's a pattern I see in boardrooms that have added "AI strategy" to their agenda. An executive presents. The board listens. Someone asks a question that's technically about AI but actually about accountability. The executive answers with something about data governance and responsible use. The board nods. The item is closed. Nothing was actually governed. Boards are being asked to sign off on AI investments they can't fully interrogate, using governance frameworks that were designed for different kinds of risk. The result is a form of governance theater: the structures exist, the sign-offs happen, and the accountability is nowhere. This isn't a criticism of boards specifically. The frameworks genuinely don't fit. Audit committees are built around financial controls and statutory reporting. Risk committees are built around quantifiable risk exposures. AI introduces a risk profile that's different in kind — systems that make decisions at scale, that degrade silently over time, that can produce outcomes nobody explicitly designed, and that concentrate vendor dependencies in ways traditional procurement governance doesn't catch. Getting governance right doesn't require every board member to understand machine learning. It requires the board to own the right things and ask the right questions — and to know the difference between a real answer and a reassuring one. What the board needs to own Board-level AI governance has three genuine responsibilities. Everything else can and should sit with management. The first is the risk appetite. Not a list of approved use cases, but a real position on where the organization's tolerance for AI-driven decisions sits. What decisions can an AI make autonomously? What decisions require a human in the loop? What outcomes, if they occurred, would represent a failure of accountability at board level? These are governance questions, not technology questions. They need a board answer. The second is accountability structure. When an AI system produces a bad outcome — a biased recommendation, a pricing error at scale, a model that degrades and nobody notices for six months — who is accountable? The answer should never be "the model." It should be a named person in a named role with a documented process for how failures get escalated. The board should know what that structure is and should have satisfied itself that it's real, not just written down somewhere. The third is vendor concentration risk. Most enterprise AI programs now run on infrastructure from a small number of large providers. The board needs visibility into those dependencies — not at the technical level, but at the risk level. What happens to business continuity if a vendor relationship breaks? What proprietary data is in the hands of external providers, and under what terms? Everything else — model selection decisions, specific use cases, technical evaluation, operational monitoring — belongs with management and the relevant technical functions. The governance trap The trap boards fall into is trying to govern AI the way they govern everything else: by approving a strategy and reviewing a report. AI doesn't work that way. A strategy document approved eighteen months ago may bear no resemblance to what's actually in production today. Models evolve. Use cases expand beyond their original scope. The risk profile of a system that started as a recommendation tool changes when it starts making operational decisions at volume. Good AI governance requires a living understanding of what the organization is actually running, not just what it approved. That means the board needs reporting that tells it what AI systems are in production, what decisions those systems are making, and whether the performance monitoring is working — not just whether the program is "on track." Most board reporting on AI covers the program status, not the risk status. Those are different documents. 7 questions that matter These aren't technical questions. They're governance questions. A board member should be able to ask them in plain language and expect a plain-language answer. What decisions is AI making on behalf of this organization, and at what volume? Not what AI capabilities we have — what decisions it's actually making. If the answer requires a thirty-minute technical explanation, the governance reporting isn't working. Who is accountable when an AI system produces a wrong or harmful output? There should be a named person, not a process or a committee. What are we monitoring, and what triggers a review or a pause? Every production AI system should have defined performance thresholds. The board should know what those are and who owns the response when they're breached. What data are we using to train and run these systems, and do we have the rights to use it that way? Data licensing and privacy compliance create real legal exposure. This is a board-level question dressed as a technical one. Which external providers have access to proprietary or customer data, and under what terms? Vendor risk is real and underdisclosed in most AI reporting. How would we know if an AI system was producing discriminatory outcomes? The answer should describe a monitoring process, not a policy statement. What would we do if we had to take a system offline? Business continuity for AI systems is frequently underdeveloped. The board should be confident an answer exists. What good reporting looks like Board AI reporting that answers these questions would include: a register of AI systems in production and what decisions they're influencing, a summary of monitoring status and recent performance alerts, an update on data licensing and vendor contract status, and a brief note on any material changes to the risk profile since the last review. What boards typically receive: a slide on the AI strategy roadmap, a progress update against implementation milestones, and a chart showing the projected ROI. Those are different conversations. The strategy and roadmap conversation is important. So is the governance one. Both need time on the agenda, and conflating them is how organizations end up with AI programs that are well-funded and under-governed.
Read full article
- 08 Jun, 2026
What AI Adoption Does to Your Existing Technology Contracts
Deploying AI into an enterprise technology stack does not happen in isolation. It happens into an existing web of contracts: software licenses, SaaS agreements, data processing terms, and vendor relationships that were written before AI capabilities were relevant and that were not designed to accommodate what an AI program requires. The collision between AI deployment and existing contracts produces a category of problem that most organizations encounter somewhere in the middle of delivery, after commitments have been made and timelines are set. The CIO and general counsel who review the contract landscape before deployment starts are in a substantially better position than those who discover the issues under delivery pressure. There are five areas where existing contracts tend to create friction for AI programs. Software licensing terms and data use Many enterprise software licenses include restrictions on how data within the system can be used beyond its primary purpose. The typical language covers authorized users, permitted use cases, and sometimes explicit restrictions on automated processing or data extraction. When an AI system is connected to a licensed software platform to extract, process, or train on the data within it, those restrictions may be relevant. A CRM contract that limits data use to direct customer relationship management may not automatically permit the creation of an AI training dataset from CRM records. A document management system license that covers authorized human users may not straightforwardly cover an AI agent that queries the system as part of an automated workflow. The likelihood that existing enterprise software licenses explicitly address AI use cases is low — most were written before those use cases were anticipated. The risk is that implicit prohibitions, generic restrictions on automated processing, or data use limitations apply in ways that neither party anticipated. Before connecting AI systems to existing licensed platforms, general counsel should review the relevant license terms for restrictions on data use and automated access, and engage with vendors where the position is unclear. SaaS data portability and processing terms SaaS agreements typically govern what data is held in the platform, how it can be exported, and what the vendor can do with it. The standard SaaS agreement, particularly for products that predate the current AI era, was written with human-facing use in mind. When an AI program requires bulk data extraction from a SaaS platform — to populate a training dataset, to build a knowledge index, to migrate data to an AI-ready format — the agreement may not straightforwardly permit this. Data export limitations, API rate limits, and format restrictions may be in the contract in ways that constrain what the AI program needs. The practical issue: discovering mid-implementation that a bulk data extraction is contractually restricted by an existing SaaS agreement is a problem that takes time to resolve. Vendor negotiations to expand export rights, technical workarounds, or alternative data sourcing each add delay and cost that was not in the original plan. Review SaaS agreements for any data that the AI program will need to process, extract, or migrate before the implementation schedule is set. Existing AI vendor agreements and scope creep Organizations that have been using AI tools for some period often have existing vendor agreements that define the scope of permitted use cases. As the AI program expands, new use cases may not be covered under the existing agreement. This matters specifically in two ways. First, using an AI vendor for use cases outside the defined scope of the agreement — even with the same tool and the same vendor — may create data handling situations the original agreement did not contemplate. Second, enterprise AI agreements often include pricing that is tied to defined use parameters. Expanding use significantly beyond those parameters may trigger renegotiation on terms less favorable than the original agreement. Audit the scope of existing AI vendor agreements against the planned AI program before expansion. Know what is and is not covered before the program is designed around assumptions about what the vendor relationship permits. Data processing agreements for third-party integrations AI programs frequently involve connecting internal data to third-party AI systems through integrations: an AI tool connected to the CRM, an AI analytics layer over the data warehouse, an AI agent with access to internal APIs. Each of these integrations creates a data flow that may require a data processing agreement. Where the integrated party processes personal data on behalf of the organization, a data processing agreement is a regulatory requirement under data protection law in most jurisdictions. For integrations that existed before the AI layer was added, the original data processing agreement may not cover the additional processing the AI component involves. Before adding AI components to existing integrations, review whether existing data processing agreements need to be updated to reflect the expanded processing scope. The integration may be technically unchanged from the data flow perspective while creating a materially different processing activity for regulatory purposes. Third-party data in AI training and indexes Organizations often use third-party data sources — market data, industry benchmarks, licensed research, external databases — in their operations. When an AI program wants to use this data as training material or as content in a retrieval index, the license for that third-party data may not permit it. Third-party data licenses typically specify permitted use cases: internal analysis, reporting, specific product use. Training an AI model on licensed data, or including it in an AI knowledge base that generates outputs for a broad user population, may constitute a use case that the license does not cover. The risk is real, the issue is common, and the discovery process for finding all the affected data sources takes time. Conduct a data sourcing review for any data that will go into AI training sets or retrieval indexes. Identify all third-party licensed content, review the license terms for AI use restrictions, and either obtain the necessary permissions or exclude the content. Practical approach for the CIO and general counsel The scale of this review problem varies significantly by organization. For most enterprises, the contract review is manageable if it is structured and approached systematically before the AI program enters delivery. Start with the data sources the AI program will use. Map every system, database, and data feed the AI system will connect to. For each, identify the governing contract and whether it has been reviewed for AI-relevant terms. Prioritize by data volume and sensitivity. The highest-volume data sources feeding the AI program, and the data categories with the most regulatory and contractual complexity, deserve the most thorough review. Engage vendors early where the position is unclear. Vendors generally prefer to resolve license ambiguity before it becomes a dispute. A proactive conversation about AI use cases typically produces better outcomes than a post-hoc assertion that something was permitted. What to take from thisSoftware licenses and SaaS agreements often contain restrictions on automated processing and data use that predate AI and may apply to AI use cases in ways neither party anticipated. Review them before deployment. Bulk data extraction requirements for AI programs may be restricted by existing SaaS agreements. Discover this before the implementation schedule depends on it. Expanding AI use cases beyond the scope defined in existing AI vendor agreements can create data handling and pricing issues. Audit current agreements against planned use. Data processing agreements need to be updated to reflect AI components added to existing integrations. The regulatory obligation does not adjust automatically when the technical architecture changes. Third-party licensed data in AI training sets or retrieval indexes may require explicit permission that the existing license does not provide. Conduct a data sourcing review before building the training set.
Read full article
- 07 Jun, 2026
What Happens to Your Data Inside a Large Language Model
One of the questions I get most often from executive teams when they start getting serious about AI governance is some version of: "If we send data to an AI model, does that data end up in the model? Can the model then use our data to answer questions for our competitors?" It is a reasonable question. The answer is more nuanced than the headlines around AI and data privacy usually suggest, and getting the nuance right matters for making sound decisions about vendor selection, data handling, and acceptable use. This is not a technical explanation. It is an executive one. I want to give you the conceptual framework that lets you ask the right questions and evaluate the answers vendors give you. The key distinction: training versus inference There are two fundamentally different things that can happen to data when it touches an AI model. Inference is what happens during normal use. You send a prompt. The model processes it using the knowledge and patterns it already has. It generates a response. Your data was processed, but it did not change the model. The model is no more or less capable after your interaction than it was before. Think of it like asking an expert a question: they used their knowledge to answer you, but they did not become a different expert because you asked. Training is different. Training is when data is used to update the model's internal parameters — to change what the model knows or how it responds. This is what actually shapes the model's behavior and capabilities. Training happens periodically, using large datasets, through a deliberate process. It is not what happens every time a user sends a prompt. The confusion between training and inference is responsible for most of the anxiety executives have about sending data to AI vendors. When an employee pastes a strategy document into an AI assistant, that document is used for inference — to generate the response. It is not, in that moment, training the model or making the model more likely to surface that information to other users. The question of whether your data is used for training is a separate one, governed by the vendor's policies and your agreement with them. When data does influence the model The concern about data "ending up in the model" is legitimate in one specific scenario: when the vendor uses interaction data to train future versions of the model. This practice is more common in consumer products than enterprise ones. Many consumer AI tools, under default settings, retain interaction data and may use it as part of the training pipeline for future model versions. This does not mean a competitor can directly query the model and retrieve your document. Training does not work like storing files in a searchable database. But your data, if used for training, has influenced the model's patterns in ways that are effectively irreversible and non-auditable. Enterprise agreements typically exclude this. When an organization purchases an enterprise license with a proper data processing agreement, the vendor generally commits to not using that organization's data for training purposes. This is one of the most important terms to verify in any AI vendor agreement, and one of the strongest reasons to ensure employees are using enterprise tiers rather than consumer accounts. The practical implication: the risk of your data influencing the model is primarily a function of which tier you are on and what your agreement says — not of using AI tools in general. What retention actually means Even when a vendor does not train on your data, they may retain it for a period. Understanding what retention means in practice matters for two reasons: regulatory compliance and the question of who can access the retained data. Vendors retain interaction data for different reasons: abuse prevention, conversation history for the user, debugging and quality assurance, and in some cases legal holds. The retention period varies from days to years depending on the product and the settings. What the retained data can be used for is defined in the vendor's privacy policy and data processing agreement. The key questions are: Can vendor employees access the content of retained interactions? Under what circumstances? Are there audit logs of such access? What are the deletion terms — can you request deletion, and is it complete? These are not abstract questions. An employee sending sensitive content to an AI tool is creating a record that exists in the vendor's infrastructure for some period. If that infrastructure is breached, or if the vendor is subject to legal process, that record is potentially accessible. The same employee would not dream of emailing that content to a stranger. But the AI tool does not feel like an external party — it feels like a private tool. The retention question is also where GDPR and similar regulations create specific obligations. Any interaction containing personal data is a transfer of personal data to a third-party processor. That transfer requires a legal basis, a data processing agreement, and compliance with data subject rights including deletion. Most organizations have not mapped their AI tool usage against these obligations. The questions a CTO should ask every AI vendor The framework above translates into a specific set of questions that should be part of any AI vendor evaluation: Is interaction data used for training future models? Under what conditions? What controls does the customer have over this? This is the most important question. Get the answer in writing, as a contractual commitment, not as a verbal assurance. What is the data retention period for interaction data? Can this be configured? What are the deletion rights and processes? What confirmation is provided when deletion is complete? Who within the vendor organization can access the content of customer interactions? Under what circumstances? Are there access logs? What are the procedures if vendor employees need to access content for support or debugging? Where is the data processed? This matters for regulatory compliance. Data about EU residents processed in jurisdictions without an adequacy decision creates specific compliance obligations that need to be managed. What happens to retained data in the event of the vendor being acquired, going out of business, or being subject to legal process? Where does customer data fall in those scenarios? What is the vendor's certification posture? SOC 2 Type II, ISO 27001, and similar certifications do not answer all of these questions, but they provide a baseline for security practices that matters for any serious enterprise evaluation. The honest assessment No AI tool is risk-free from a data perspective. Sending data to any third-party system involves some degree of information leaving your infrastructure, under terms you did not write, in systems you do not control. That is true of cloud storage, email services, and every other third-party tool the organization uses. The question is whether the risk is understood, whether the terms are acceptable given the regulatory and contractual context, and whether the data classification of what is being sent is appropriate for the tier and agreement in place. The worst outcome is not using AI tools with enterprise data under a proper enterprise agreement with a reputable vendor. The worst outcome is using consumer-tier products with default settings, with sensitive data, without any of the contractual protections that make enterprise use manageable. Most organizations are currently somewhere in between. The CTO's job is to understand exactly where on that spectrum the organization sits, and to move deliberately toward the part of the spectrum that is defensible. What to take from thisTraining and inference are different. Using an AI tool to process data does not automatically mean that data trains the model. Whether it does depends on the vendor's policies and your agreement. The training exclusion is one of the most important terms in an enterprise AI agreement. Verify it explicitly — a verbal assurance is not sufficient. Retention means your data exists on vendor infrastructure for some period. Understand the retention period, access controls, and deletion rights for every tool in active use. Consumer and enterprise tiers of the same product often have materially different data handling terms. The tier distinction matters more than the vendor selection in many cases. Map AI tool usage against data protection obligations before the next regulatory review, not during it.The executives who handle this well are the ones who moved past the surface-level anxiety about "AI knowing your data" and got specific about the mechanisms: what are the actual terms, what does retention mean, and what commitments can the vendor make in writing?
Read full article