Showing Posts From
Ai vendors
- 07 Jun, 2026
What Happens to Your Data Inside a Large Language Model
One of the questions I get most often from executive teams when they start getting serious about AI governance is some version of: "If we send data to an AI model, does that data end up in the model? Can the model then use our data to answer questions for our competitors?" It is a reasonable question. The answer is more nuanced than the headlines around AI and data privacy usually suggest, and getting the nuance right matters for making sound decisions about vendor selection, data handling, and acceptable use. This is not a technical explanation. It is an executive one. I want to give you the conceptual framework that lets you ask the right questions and evaluate the answers vendors give you. The key distinction: training versus inference There are two fundamentally different things that can happen to data when it touches an AI model. Inference is what happens during normal use. You send a prompt. The model processes it using the knowledge and patterns it already has. It generates a response. Your data was processed, but it did not change the model. The model is no more or less capable after your interaction than it was before. Think of it like asking an expert a question: they used their knowledge to answer you, but they did not become a different expert because you asked. Training is different. Training is when data is used to update the model's internal parameters — to change what the model knows or how it responds. This is what actually shapes the model's behavior and capabilities. Training happens periodically, using large datasets, through a deliberate process. It is not what happens every time a user sends a prompt. The confusion between training and inference is responsible for most of the anxiety executives have about sending data to AI vendors. When an employee pastes a strategy document into an AI assistant, that document is used for inference — to generate the response. It is not, in that moment, training the model or making the model more likely to surface that information to other users. The question of whether your data is used for training is a separate one, governed by the vendor's policies and your agreement with them. When data does influence the model The concern about data "ending up in the model" is legitimate in one specific scenario: when the vendor uses interaction data to train future versions of the model. This practice is more common in consumer products than enterprise ones. Many consumer AI tools, under default settings, retain interaction data and may use it as part of the training pipeline for future model versions. This does not mean a competitor can directly query the model and retrieve your document. Training does not work like storing files in a searchable database. But your data, if used for training, has influenced the model's patterns in ways that are effectively irreversible and non-auditable. Enterprise agreements typically exclude this. When an organization purchases an enterprise license with a proper data processing agreement, the vendor generally commits to not using that organization's data for training purposes. This is one of the most important terms to verify in any AI vendor agreement, and one of the strongest reasons to ensure employees are using enterprise tiers rather than consumer accounts. The practical implication: the risk of your data influencing the model is primarily a function of which tier you are on and what your agreement says — not of using AI tools in general. What retention actually means Even when a vendor does not train on your data, they may retain it for a period. Understanding what retention means in practice matters for two reasons: regulatory compliance and the question of who can access the retained data. Vendors retain interaction data for different reasons: abuse prevention, conversation history for the user, debugging and quality assurance, and in some cases legal holds. The retention period varies from days to years depending on the product and the settings. What the retained data can be used for is defined in the vendor's privacy policy and data processing agreement. The key questions are: Can vendor employees access the content of retained interactions? Under what circumstances? Are there audit logs of such access? What are the deletion terms — can you request deletion, and is it complete? These are not abstract questions. An employee sending sensitive content to an AI tool is creating a record that exists in the vendor's infrastructure for some period. If that infrastructure is breached, or if the vendor is subject to legal process, that record is potentially accessible. The same employee would not dream of emailing that content to a stranger. But the AI tool does not feel like an external party — it feels like a private tool. The retention question is also where GDPR and similar regulations create specific obligations. Any interaction containing personal data is a transfer of personal data to a third-party processor. That transfer requires a legal basis, a data processing agreement, and compliance with data subject rights including deletion. Most organizations have not mapped their AI tool usage against these obligations. The questions a CTO should ask every AI vendor The framework above translates into a specific set of questions that should be part of any AI vendor evaluation: Is interaction data used for training future models? Under what conditions? What controls does the customer have over this? This is the most important question. Get the answer in writing, as a contractual commitment, not as a verbal assurance. What is the data retention period for interaction data? Can this be configured? What are the deletion rights and processes? What confirmation is provided when deletion is complete? Who within the vendor organization can access the content of customer interactions? Under what circumstances? Are there access logs? What are the procedures if vendor employees need to access content for support or debugging? Where is the data processed? This matters for regulatory compliance. Data about EU residents processed in jurisdictions without an adequacy decision creates specific compliance obligations that need to be managed. What happens to retained data in the event of the vendor being acquired, going out of business, or being subject to legal process? Where does customer data fall in those scenarios? What is the vendor's certification posture? SOC 2 Type II, ISO 27001, and similar certifications do not answer all of these questions, but they provide a baseline for security practices that matters for any serious enterprise evaluation. The honest assessment No AI tool is risk-free from a data perspective. Sending data to any third-party system involves some degree of information leaving your infrastructure, under terms you did not write, in systems you do not control. That is true of cloud storage, email services, and every other third-party tool the organization uses. The question is whether the risk is understood, whether the terms are acceptable given the regulatory and contractual context, and whether the data classification of what is being sent is appropriate for the tier and agreement in place. The worst outcome is not using AI tools with enterprise data under a proper enterprise agreement with a reputable vendor. The worst outcome is using consumer-tier products with default settings, with sensitive data, without any of the contractual protections that make enterprise use manageable. Most organizations are currently somewhere in between. The CTO's job is to understand exactly where on that spectrum the organization sits, and to move deliberately toward the part of the spectrum that is defensible. What to take from thisTraining and inference are different. Using an AI tool to process data does not automatically mean that data trains the model. Whether it does depends on the vendor's policies and your agreement. The training exclusion is one of the most important terms in an enterprise AI agreement. Verify it explicitly — a verbal assurance is not sufficient. Retention means your data exists on vendor infrastructure for some period. Understand the retention period, access controls, and deletion rights for every tool in active use. Consumer and enterprise tiers of the same product often have materially different data handling terms. The tier distinction matters more than the vendor selection in many cases. Map AI tool usage against data protection obligations before the next regulatory review, not during it.The executives who handle this well are the ones who moved past the surface-level anxiety about "AI knowing your data" and got specific about the mechanisms: what are the actual terms, what does retention mean, and what commitments can the vendor make in writing?
Read full article
- 09 Apr, 2026
What Your AI Vendor Knows About Your Business After Six Months
When an organization signs an enterprise AI agreement, the focus is almost always on what the vendor will provide — model capabilities, performance benchmarks, uptime commitments, support terms. The less examined side of the exchange is what the vendor learns about the organization over the course of the relationship. This is not a question of whether the vendor is misusing data. Most enterprise AI vendors have robust commitments around data use and treat customer data with appropriate care. The question is subtler: what does the accumulated pattern of the organization's AI usage tell a sophisticated observer about how the business operates, and what are the implications of that information sitting with a third party for years? The implications are not obvious until you think them through. What usage data reveals An AI vendor with access to enterprise usage data can observe, at scale and over time, patterns that individual data points do not reveal. What the organization focuses on. The topics, domains, and question types that generate the highest AI usage volume reveal where the organization is directing attention. A spike in queries about regulatory compliance in a specific jurisdiction signals a business development or risk management concern before it shows up in any public disclosure. A sustained pattern of usage around a particular product area signals strategic investment before any announcement. How the organization works. The workflows AI tools are used in reveal process patterns: how decisions are prepared, what information sources are consulted, how different functions interact, where bottlenecks exist. This is the kind of operational picture that management consultants spend weeks building in client engagements. AI vendors accumulate it as a byproduct of normal usage. Where the organization's capabilities are strong and where they are not. The questions an organization asks of an AI system reflect, to some degree, what the people asking cannot do themselves. Heavy usage of AI tools for a specific type of analysis suggests that internal capability is limited in that area. A pattern of AI-assisted communication drafting in certain functions suggests communication capability constraints. Who the organization interacts with. Queries that reference client names, partner organizations, or market contexts — even in enterprise agreements where input content is excluded from training — create metadata about the organization's relationship network and market focus. None of this requires the vendor to actively analyze any specific piece of content. Aggregate usage patterns make these inferences available without individual query inspection. Why this accumulates over time The picture that emerges after six months of enterprise AI usage is qualitatively different from what was visible at month one. The accumulation of patterns across thousands of interactions, across multiple functions, across different business cycles reveals consistency and change in ways that a snapshot does not. Organizations change focus, enter new markets, encounter new challenges, and invest in new capabilities. All of those shifts are visible in AI usage patterns before they are visible elsewhere. The vendor relationship, if it persists, captures the strategic trajectory of the organization over time. This is particularly relevant for multi-year AI vendor relationships, which are increasingly common as organizations embed AI tools into core workflows. An AI vendor that has maintained an enterprise relationship for three or four years has accumulated a longitudinal view of the organization's strategic and operational evolution that very few parties outside the organization have. The vendor concentration dimension The question of what a single AI vendor knows about an organization becomes more significant when that vendor also serves the organization's competitors, its clients, or its industry peers. This does not mean the vendor is sharing information between customers — contractual commitments and practical self-interest both constrain that. But it does mean the vendor has a vantage point on industry-wide patterns that individual organizations lack. Aggregate insights about what questions enterprises in a specific industry are asking of AI systems, what capabilities they are developing, where they are investing — this is a form of competitive intelligence that accrues to the vendor in ways that have no clean analog in traditional software relationships. For organizations in sectors where competitive intelligence matters — financial services, pharmaceuticals, technology — the accumulation of strategic signal at a shared AI vendor is worth thinking about explicitly. What the CFO should factor into vendor relationship management The financial relationship with an AI vendor needs to account for switching costs that go beyond the cost of migrating to a new platform. The accumulated organizational context — the conversation history, the fine-tuned models, the usage patterns and metadata that have built up over years — creates a real switching cost that is not always visible at contract negotiation. Organizations that have deeply embedded a single AI vendor into core workflows may find that switching is more expensive than they anticipated, not because the technology cannot be replicated but because the years of accumulated context cannot easily be transferred. This is relevant to contract renewal negotiations, where vendors understand the switching cost dynamic better than most customers. It is also relevant to how the organization structures its AI vendor portfolio — whether to consolidate around a single vendor for maximum integration, or to distribute across vendors in ways that limit the strategic depth of any single relationship. What to do about it This is not an argument for avoiding AI vendors or maintaining zero-depth relationships. The value of AI tools requires meaningful integration, and meaningful integration creates the usage patterns described above. The practical response is to understand what the relationship accumulates and manage it deliberately. Conduct a periodic vendor relationship review that includes, alongside performance and cost, an assessment of what the vendor relationship has revealed about the organization through usage. This is not paranoia — it is the same kind of vendor relationship management organizations apply to any strategic supplier relationship. Review data minimization options. Many AI vendor agreements include options to limit usage data retention, opt out of certain analytics, or configure how interaction metadata is handled. These options are not always publicized, but they are often available in enterprise agreements. Understand them before defaulting to whatever the vendor's standard configuration produces. Consider the vendor concentration question explicitly in AI strategy. The organization that routes all AI usage through a single vendor is building a deeper relationship than the one that distributes across vendors. Both approaches have merits. The decision should be deliberate rather than a byproduct of procurement timing. Build contract terms around usage data explicitly. What the vendor can do with aggregate usage data — not just input content — should be addressed in the enterprise agreement, not assumed from the default terms. What to take from thisEnterprise AI usage creates an aggregate picture of the organization's focus, workflows, and capabilities over time. Understand what that picture contains. Multi-year AI vendor relationships accumulate strategic signal about the organization's trajectory. The longer the relationship, the more the vendor knows. Switching costs for deeply embedded AI vendors include the loss of accumulated context, not just migration effort. Factor this into vendor relationship management. Review data minimization options in enterprise agreements. They are often available and not actively surfaced. Address how the vendor may use aggregate usage data — distinct from input content — in the enterprise agreement terms.The organizations that handle this thoughtfully are not the ones who avoid AI vendor relationships. They are the ones who understand what those relationships accumulate and manage them with the same care they apply to any strategic supplier holding significant organizational knowledge.
Read full article
- 19 Mar, 2026
How AI Vendors Use Your Data: Contract Versus Reality
I have read a lot of AI vendor contracts in the past few years. Not because contract review is interesting in itself, but because the gap between what vendors say in sales conversations and what their agreements actually commit to has consequences. Organizations that do not close that gap before signing end up discovering what the contract actually says at the worst possible time. The general shape of an AI vendor data agreement is worth understanding at the executive level — not because the CFO or CIO needs to redline individual clauses, but because the strategic choices about which vendors to use and under what conditions flow directly from what those agreements permit and exclude. Here is what I see consistently. The default terms favor the vendor This should not be surprising. The default data processing terms in any commercial agreement are written to minimize the vendor's liability and maximize their operational flexibility. AI vendor agreements are no different, and in some respects they are more aggressively drafted than traditional software agreements because the stakes around data use are higher and the regulatory landscape is still evolving. The standard structure of a consumer or early-stage enterprise AI agreement typically includes: A broad grant to the vendor to use interaction data for service improvement, model training, and product development purposes, subject to anonymization or aggregation. In practice, what "anonymization" means and how consistently it is applied is rarely specified. Retention periods that are defined by the vendor's operational needs rather than the customer's preferences, often without a customer-initiated deletion right. Liability limitations that cap the vendor's exposure in the event of a data incident at amounts that bear no relationship to the potential harm to the customer — typically limited to fees paid rather than the value of the data or the cost of a breach. Unilateral modification rights that allow the vendor to change the data processing terms with notice, sometimes as short as 30 days, without requiring the customer's affirmative consent. None of these are unusual in commercial software agreements. But when the agreement governs how your organization's strategic data, client information, and proprietary content is handled, they warrant closer attention than a standard SaaS contract. What changes in a properly negotiated enterprise agreement The distinction between a default agreement and a properly negotiated enterprise agreement is significant. When procurement and legal have done their job, the enterprise agreement should include at minimum: Exclusion from training data. A clear, contractually binding commitment that the customer's interaction data will not be used to train or fine-tune the vendor's models. This is the single most important data term and the one that organizations should refuse to proceed without. Data processing agreement compliant with applicable regulations. For any processing of personal data of EU residents, a GDPR-compliant data processing agreement is a legal requirement. Increasingly, other jurisdictions impose similar requirements. This agreement specifies the purposes for which data is processed, the retention periods, the data subject rights the vendor will support, and the security measures in place. Defined retention and deletion terms. The agreement should specify how long the vendor retains interaction data, under what circumstances, and what deletion looks like — with confirmation that deletion is complete and irreversible. Sub-processor disclosure and control. AI platforms often rely on cloud infrastructure, third-party safety tooling, and other sub-processors. The enterprise agreement should disclose who these are and give the customer the ability to object to new sub-processors. Breach notification terms. The timeframe within which the vendor will notify the customer of a security incident affecting customer data. Thirty days is common in default agreements; 72 hours is what most regulatory regimes require you to provide to your own regulators. Make sure the vendor's notification obligation to you is faster than your notification obligation to regulators. The clauses that cause problems later In practice, the clauses that create the most problems are not the ones organizations focus on during negotiation. Aggregated and anonymized data carve-outs. Most agreements carve out "aggregated and anonymized data" from the restrictions on training use, with the rationale that anonymized data cannot be traced back to the customer. The problem is that what counts as "anonymized" is not usually defined with precision, and for certain types of content — queries about niche industries, specialized technical topics, or specific organizational patterns — re-identification is more feasible than the carve-out implies. Operational necessity language. Agreements often include broad permissions for the vendor to process customer data "as necessary to provide and improve the service." The scope of "improve the service" is frequently contested. Make sure this language is defined, not left open. Right to audit provisions. The ability to verify that the vendor is actually complying with the data processing commitments they have made. Many agreements include an audit right that is functionally unusable — limited to once per year, requiring 90 days notice, subject to the vendor's approval of the auditor. An audit right with those conditions provides limited practical assurance. Termination data handling. What happens to your data when the contract ends. How long does the vendor retain it after termination? What format is it returned in? Is deletion from backup systems addressed? Organizations that have ended vendor relationships often discover that "deletion" in practice means deletion from active systems, with indefinite retention in backup infrastructure. The sales conversation versus the signed contract The gap I see most often is between what the sales team communicates during the evaluation — "we never use your data for training," "your data is completely private," "you retain full ownership of everything" — and what the signed agreement actually commits to. This is not always deliberate misrepresentation. Sales teams are not contract lawyers, and they often communicate what they believe to be true without knowing the precise legal scope of the commitments they are describing. The problem is that verbal assurances do not create contractual obligations. What matters is what the signed agreement says. The practical implication: have the conversation about data handling before the procurement decision, but validate every assurance by finding the corresponding contractual language. If the vendor says they do not train on customer data, ask them to point to the specific clause that says so. If the clause does not exist, or if it is qualified in ways that limit its practical scope, that is important information. What the CFO should be looking at The CFO's lens on AI vendor data terms is different from the CIO's. Beyond the data handling questions, the financial and liability exposure matters. Liability caps that are set at fees paid rather than harm caused mean that in the event of a serious data incident, the vendor's contractual exposure is often a fraction of the cost the organization incurs — in regulatory fines, breach notification costs, customer notification, and reputational damage. This does not mean the vendor relationship is unworkable, but it does mean the organization is bearing most of the downside risk and should price that accordingly. Insurance coverage. Some AI vendor incidents may fall into gaps between the organization's existing cyber insurance policy and the vendor's coverage. This is worth reviewing explicitly before the program goes live. Renewal and price terms. AI vendor agreements increasingly include significant pricing flexibility — unilateral price changes, usage-based components that scale in ways that are hard to predict, and renewal terms that are less favorable than the initial agreement. Understanding the financial exposure over a three-to-five year horizon matters for the investment case. What to take from thisDefault AI vendor data terms are written for the vendor's benefit. Do not assume they protect customer interests without reviewing them. The training exclusion is non-negotiable for any enterprise deployment handling sensitive data. Get it as a contractual commitment, not a sales assurance. Aggregated and anonymized data carve-outs are often broader than they appear. Define what anonymization means in the specific context of your data. Audit rights that are functionally unusable provide no real assurance. Push for meaningful audit provisions. Verify every data handling assurance the sales team makes by finding the contractual language that supports it. If it is not in the contract, it is not a commitment.The organizations that manage AI vendor relationships well are not the ones with the longest or most restrictive agreements. They are the ones that understood what they were agreeing to before they signed, addressed the material gaps, and built a vendor relationship on actual commitments rather than assumed ones.
Read full article