Showing Posts From

Cio

What AI Adoption Does to Your Existing Technology Contracts

What AI Adoption Does to Your Existing Technology Contracts

Deploying AI into an enterprise technology stack does not happen in isolation. It happens into an existing web of contracts: software licenses, SaaS agreements, data processing terms, and vendor relationships that were written before AI capabilities were relevant and that were not designed to accommodate what an AI program requires. The collision between AI deployment and existing contracts produces a category of problem that most organizations encounter somewhere in the middle of delivery, after commitments have been made and timelines are set. The CIO and general counsel who review the contract landscape before deployment starts are in a substantially better position than those who discover the issues under delivery pressure. There are five areas where existing contracts tend to create friction for AI programs. Software licensing terms and data use Many enterprise software licenses include restrictions on how data within the system can be used beyond its primary purpose. The typical language covers authorized users, permitted use cases, and sometimes explicit restrictions on automated processing or data extraction. When an AI system is connected to a licensed software platform to extract, process, or train on the data within it, those restrictions may be relevant. A CRM contract that limits data use to direct customer relationship management may not automatically permit the creation of an AI training dataset from CRM records. A document management system license that covers authorized human users may not straightforwardly cover an AI agent that queries the system as part of an automated workflow. The likelihood that existing enterprise software licenses explicitly address AI use cases is low — most were written before those use cases were anticipated. The risk is that implicit prohibitions, generic restrictions on automated processing, or data use limitations apply in ways that neither party anticipated. Before connecting AI systems to existing licensed platforms, general counsel should review the relevant license terms for restrictions on data use and automated access, and engage with vendors where the position is unclear. SaaS data portability and processing terms SaaS agreements typically govern what data is held in the platform, how it can be exported, and what the vendor can do with it. The standard SaaS agreement, particularly for products that predate the current AI era, was written with human-facing use in mind. When an AI program requires bulk data extraction from a SaaS platform — to populate a training dataset, to build a knowledge index, to migrate data to an AI-ready format — the agreement may not straightforwardly permit this. Data export limitations, API rate limits, and format restrictions may be in the contract in ways that constrain what the AI program needs. The practical issue: discovering mid-implementation that a bulk data extraction is contractually restricted by an existing SaaS agreement is a problem that takes time to resolve. Vendor negotiations to expand export rights, technical workarounds, or alternative data sourcing each add delay and cost that was not in the original plan. Review SaaS agreements for any data that the AI program will need to process, extract, or migrate before the implementation schedule is set. Existing AI vendor agreements and scope creep Organizations that have been using AI tools for some period often have existing vendor agreements that define the scope of permitted use cases. As the AI program expands, new use cases may not be covered under the existing agreement. This matters specifically in two ways. First, using an AI vendor for use cases outside the defined scope of the agreement — even with the same tool and the same vendor — may create data handling situations the original agreement did not contemplate. Second, enterprise AI agreements often include pricing that is tied to defined use parameters. Expanding use significantly beyond those parameters may trigger renegotiation on terms less favorable than the original agreement. Audit the scope of existing AI vendor agreements against the planned AI program before expansion. Know what is and is not covered before the program is designed around assumptions about what the vendor relationship permits. Data processing agreements for third-party integrations AI programs frequently involve connecting internal data to third-party AI systems through integrations: an AI tool connected to the CRM, an AI analytics layer over the data warehouse, an AI agent with access to internal APIs. Each of these integrations creates a data flow that may require a data processing agreement. Where the integrated party processes personal data on behalf of the organization, a data processing agreement is a regulatory requirement under data protection law in most jurisdictions. For integrations that existed before the AI layer was added, the original data processing agreement may not cover the additional processing the AI component involves. Before adding AI components to existing integrations, review whether existing data processing agreements need to be updated to reflect the expanded processing scope. The integration may be technically unchanged from the data flow perspective while creating a materially different processing activity for regulatory purposes. Third-party data in AI training and indexes Organizations often use third-party data sources — market data, industry benchmarks, licensed research, external databases — in their operations. When an AI program wants to use this data as training material or as content in a retrieval index, the license for that third-party data may not permit it. Third-party data licenses typically specify permitted use cases: internal analysis, reporting, specific product use. Training an AI model on licensed data, or including it in an AI knowledge base that generates outputs for a broad user population, may constitute a use case that the license does not cover. The risk is real, the issue is common, and the discovery process for finding all the affected data sources takes time. Conduct a data sourcing review for any data that will go into AI training sets or retrieval indexes. Identify all third-party licensed content, review the license terms for AI use restrictions, and either obtain the necessary permissions or exclude the content. Practical approach for the CIO and general counsel The scale of this review problem varies significantly by organization. For most enterprises, the contract review is manageable if it is structured and approached systematically before the AI program enters delivery. Start with the data sources the AI program will use. Map every system, database, and data feed the AI system will connect to. For each, identify the governing contract and whether it has been reviewed for AI-relevant terms. Prioritize by data volume and sensitivity. The highest-volume data sources feeding the AI program, and the data categories with the most regulatory and contractual complexity, deserve the most thorough review. Engage vendors early where the position is unclear. Vendors generally prefer to resolve license ambiguity before it becomes a dispute. A proactive conversation about AI use cases typically produces better outcomes than a post-hoc assertion that something was permitted. What to take from thisSoftware licenses and SaaS agreements often contain restrictions on automated processing and data use that predate AI and may apply to AI use cases in ways neither party anticipated. Review them before deployment. Bulk data extraction requirements for AI programs may be restricted by existing SaaS agreements. Discover this before the implementation schedule depends on it. Expanding AI use cases beyond the scope defined in existing AI vendor agreements can create data handling and pricing issues. Audit current agreements against planned use. Data processing agreements need to be updated to reflect AI components added to existing integrations. The regulatory obligation does not adjust automatically when the technical architecture changes. Third-party licensed data in AI training sets or retrieval indexes may require explicit permission that the existing license does not provide. Conduct a data sourcing review before building the training set.

Read full article
How to Classify Your Data Before Your AI Program Does It for You

How to Classify Your Data Before Your AI Program Does It for You

Data classification is one of those governance practices that most organizations have in some form and almost none have in a form that is adequate for AI. The gap matters because AI deployment without a working classification framework creates a specific category of problem: the system treats all accessible data as equivalent input, and the outputs reflect that indiscriminateness in ways that are difficult to predict and costly to remediate after the fact. The CIO who gets this right before the AI program starts is in a very different position from the one who inherits a classification gap when the first incident surfaces. Here is what a practical classification approach looks like when AI deployment is the specific forcing function. Why existing classification frameworks usually fall short Most organizations have some form of data classification. The typical structure is a four-level hierarchy: public, internal, confidential, and restricted. Documents get tagged — or are supposed to get tagged — at one of these levels. Access controls are set accordingly. This framework was designed for a world where humans navigate information deliberately. You look for a document, you find it, you read it. The sensitivity of what you see is a function of where you went to look. AI tools do not navigate information that way. They can process everything they have access to simultaneously, surface connections between data sources that were never designed to be combined, and produce outputs that reflect the aggregate of what they have seen rather than any single document. The sensitivity classification of individual documents does not translate cleanly into the sensitivity of an AI system's outputs. There are three specific failure modes I see in organizations that try to apply existing classification frameworks to AI deployment. Permission-level accuracy. Existing classification may reflect the intention of who should access what, but actual permissions often diverge from the classification framework over time. Documents move between folders. Projects end and access is not revoked. Distribution lists grow and are not pruned. When an AI system is given access to everything a user can access, it inherits this divergence between intended and actual permissions. Output sensitivity. A document classified as "internal" might, in combination with five other documents also classified as "internal," produce an AI output that reveals information that would have been classified "confidential" if anyone had written it down directly. The classification framework addresses individual document sensitivity but not the sensitivity of AI-generated synthesis. Dynamic content. AI systems that connect to live data sources — CRMs, financial systems, email archives — encounter content that has never been classified at all, because classification was designed for documents rather than data records. Building a classification framework for AI specifically A classification framework that works for AI deployment needs to answer three questions that the standard framework typically does not. What can this data type be used for in AI context? Rather than a single sensitivity level, each data category needs a set of permitted AI use cases. Client financial data might be appropriate for internal analytics AI but not for a tool that produces externally shared outputs. Personal data might be appropriate for a tool with data processing agreement coverage but not for one without it. The permitted use case dimension is specific to AI and does not exist in traditional classification frameworks. What combinations create elevated sensitivity? Certain combinations of data categories produce outputs that are more sensitive than any individual category. A practical classification framework for AI should identify the high-risk combinations and set explicit controls around AI systems that can access both. What is the real-time classification status? For live data sources, the classification question is not just "what is this data type" but "what is the current state of this specific record, and does that affect what AI can do with it." A client record that includes active litigation flags, for example, may need to be treated differently than a standard client record even if the data type is classified the same way. The practical approach Doing this well does not require a multi-year data governance program. It requires a focused exercise tied directly to the AI deployment timeline. Here is what that looks like. Start with the AI system's data access scope. Before classifying anything, define what data sources the AI system will be connected to. The classification exercise is scoped to those sources. Everything else can wait. Map the sensitive data categories within scope. For each data source the AI will access, identify what sensitive categories exist: personal data, commercially sensitive data, legally privileged material, client confidential data, regulated financial data. This is an inventory exercise, and it usually reveals data in places people did not expect it. Define permitted use cases for each category. For each sensitive category, specify what the AI system is and is not permitted to do with it. This becomes the basis for technical controls — what data the system can retrieve, what it can include in outputs, and what it should exclude or flag. Build the combination rules. Identify the high-risk combinations and set rules for how the AI system handles them. This is the hardest part and the one most often skipped. Spending a day on this with the CIO, the data protection officer, and the AI system owner is worth it. Implement classification tags as technical controls. The classification decisions need to be expressed as technical constraints that the AI system respects, not just as policy documentation. A policy that says "the AI should not include client financial data in externally visible outputs" is unenforceable unless the system is technically configured to prevent it. The CIO's role in making this work Data classification for AI is not a project the technical team can own independently. The decisions about which data categories can be used for which AI purposes require input from legal, compliance, and the business functions that own the data. The CIO's role is to convene those conversations and drive them to decisions before the AI system goes live, not after. The alternative — deploying the AI system and addressing classification issues as they surface — is more expensive and more disruptive. When an AI system produces an output that reveals information it should not have had access to, the response involves technical remediation, incident investigation, potential regulatory notification, and organizational credibility damage. All of which are harder than running the classification exercise before deployment. The time required for a focused data classification exercise scoped to a specific AI deployment is typically two to four weeks for a system with well-defined data access scope. That is a reasonable investment given the alternative. What to take from thisExisting data classification frameworks were designed for human navigation of information. They do not translate directly to AI access, which aggregates and synthesizes rather than navigates. Classification for AI needs to address permitted use cases, high-risk combinations, and live data — three dimensions that standard frameworks typically do not cover. Scope the classification exercise to the AI system's data access, not the organization's entire data estate. A focused exercise is achievable in weeks; an organization-wide program is not. Classification decisions need to be expressed as technical constraints, not just policy documentation. A policy without technical enforcement is not a control. The CIO needs to convene legal, compliance, and business data owners in the classification exercise. The decisions require input from all of them, and making them without that input produces gaps.The organizations that get AI deployment right are not the ones with the most comprehensive data governance programs. They are the ones that did the focused, practical work of understanding their data before they connected it to a model, and made deliberate decisions about what that meant for acceptable use.

Read full article
What AI Actually Requires From Your Data Infrastructure to Scale

What AI Actually Requires From Your Data Infrastructure to Scale

The AI program has been approved, the vendor is selected, the team is assembled, and then somebody runs the data assessment. What they find — inconsistent data models, missing labels, fragmented systems, unclear ownership — is the same thing found in most enterprises when they look carefully for the first time. The program does not fail at this point. It slows down, the timeline gets revised, the initial scope gets reduced, and the business expectations that were set in the approval process do not get met on schedule. This is the pattern I see most often when AI programs run into trouble, and it is almost always traced back to the same root cause: the data infrastructure requirements for AI at scale were not understood when the program was designed. The CIO who understands these requirements upfront can either design the program around them or secure the investment to address them. Either path is better than discovering the gap during delivery. The data requirements that tend to be underestimated Data availability and accessibility. AI systems need data at query time or training time, and they need it in a form they can process. In most enterprises, relevant data lives across multiple systems — a CRM, an ERP, a data warehouse, a collection of flat files, some APIs — with different schemas, different access mechanisms, and different freshness characteristics. The work of making that data accessible to an AI system is infrastructure work, not AI work, and it is consistently underestimated. The practical implication: before committing to an AI delivery timeline, map which data sources the system will need access to, what the access mechanism is for each, and whether a data integration layer needs to be built or updated. This often takes months and is not typically included in AI vendor timelines. Data quality at the point of use. AI systems amplify data quality problems. A system trained on or retrieval-indexed against inaccurate, incomplete, or inconsistent data will produce confident-sounding outputs that reflect those problems. The model has no way to know that the customer record is outdated or that the product data is inconsistently formatted across systems. The organizations I see struggle with AI quality most reliably are the ones that treated data quality as a pre-existing solved problem when it was not. Data quality issues that were manageable in human-reviewed processes become highly visible when AI processes the same data and produces outputs that expose the inconsistencies. Labeling and structure for training use cases. For AI applications that require model training — not just retrieval-augmented generation — the training data needs to be labeled in a way that reflects what the model is supposed to learn. In most enterprises, the historical data that would be most useful for training is not labeled for the relevant task, was not structured with model training in mind, and requires significant preparation work before it is ready. An AI use case that requires supervised training on historical data — a classification system, a predictive model, an automated decision support tool — implicitly requires a data labeling exercise. This is often not scoped, not budgeted, and not understood by the business stakeholders who approved the use case. Data freshness and pipeline reliability. AI systems that operate on live or recent data need data pipelines that deliver data at the required latency and with acceptable reliability. In many enterprise data environments, the pipelines that move data from operational systems to analytical environments run on batch schedules that are inconsistent with the freshness requirements of an AI application that is supposed to support real-time decisions. Building or upgrading data pipelines to support AI freshness requirements is infrastructure investment that is separate from the AI system itself. It tends not to appear in AI project budgets. The governance requirements that get missed Data infrastructure for AI is not just technical. It has a governance layer that the CIO needs to own before the AI program runs into it. Data ownership and authority. AI systems require someone to decide what data they can access, what they can do with it, and who can change those parameters. In most enterprises, data ownership is unclear — data exists in systems owned by IT but created and maintained by business functions, with no single party who has clear authority to approve AI system access. The AI program surfaces this ambiguity in a way that other programs did not. Data lineage for AI outputs. When an AI system produces an output, the ability to trace that output back to the source data matters both for debugging and for regulatory purposes. This requires data lineage tooling and practices that most organizations have not prioritized, because the use cases that required them previously were narrower. Access controls at the data level. The access control requirements for AI systems are different from those for human users. An AI system that processes data on behalf of many users needs access controls that reflect what each user should be able to see, applied dynamically at the time the system generates outputs. Most data infrastructure was not designed for this pattern. What the CIO needs to establish before the program starts The work that makes AI programs succeed from a data perspective is not done by the AI team. It is done by the data engineering and infrastructure function, working from a clear set of requirements before the AI program timeline is set. Specifically: Run a data infrastructure assessment scoped to the AI program's requirements. This assessment should identify what data the AI system needs, what state that data is in, what gaps exist, and what work is required to close them. The assessment output should feed directly into the program plan. Define data ownership for AI access before the program enters delivery. The conversations about which data the AI system can access are harder to have mid-delivery than pre-delivery. Get the governance decisions made before the program is scheduled around them. Include data pipeline and infrastructure work in the program budget and timeline. This work is frequently treated as a prerequisite that will be addressed separately, which means it is not resourced and becomes a blocker. It needs to be inside the program. Set data quality thresholds explicitly. What level of completeness, consistency, and accuracy is required for the AI system to produce reliable outputs? These thresholds should be defined and measured before the system goes live, not after the first quality issue surfaces in production. What to take from thisData infrastructure gaps are the most common reason AI programs miss their original timelines. Run a data infrastructure assessment as part of program planning, not as a separate track. AI systems amplify data quality problems. Assess the quality of the data the system will use before committing to performance targets. Training use cases with labeled data requirements are underestimated consistently. If the use case requires model training, scope the labeling work explicitly. Data freshness requirements for live AI applications often exceed what existing batch pipelines can deliver. Build or upgrade the data pipelines as part of the AI program. Data ownership and governance for AI access need to be resolved before delivery starts. These decisions are harder to make under delivery pressure than during planning.The AI programs I have seen deliver on their original timelines shared a common characteristic: the CIO ran the data assessment early, understood the infrastructure gaps, and either adjusted the program plan or secured the investment to address them. The ones that struggled did not.

Read full article
What a Data Breach Looks Like When AI Is in the Middle of It

What a Data Breach Looks Like When AI Is in the Middle of It

Most enterprise data breach response plans were written for a specific type of incident: unauthorized external access to a database, a misconfigured cloud storage bucket, a stolen credential, a ransomware attack. The response playbook is well understood. Contain the breach, assess the scope, notify regulators, notify affected individuals, remediate the vulnerability. When an AI system is in the middle of a breach — as a vector, as an amplifier of exposure, or as the primary source of the incident — the playbook breaks down in several places. The scope assessment is harder. The cause is less obvious. The regulatory notification may require analysis the organization has not done. And the communications with affected parties need to account for AI involvement in ways that the standard template does not anticipate. Organizations that have AI systems in production and have not updated their incident response plans are carrying risk they have not quantified. The ways AI changes the breach scenario AI as a vector. A prompt injection attack — where malicious content in the AI system's input causes the system to execute unintended actions — is a category of attack that did not exist before AI systems were connected to organizational data. The technical mechanics are different from a SQL injection or a credential attack, but the organizational response involves the same triage: what did the attacker access, what did they exfiltrate, what actions did the AI system take on their behalf? Prompt injection is not theoretical. It has been demonstrated against production AI systems across multiple vendors. Organizations that have not evaluated their AI systems against this class of attack have a gap in their security assessment. AI as an amplifier. An attacker who compromises credentials to an account with AI system access may be able to extract substantially more information than they could from the underlying data systems alone. The AI system's ability to query, synthesize, and summarize across data sources means that a single compromised session can produce outputs equivalent to weeks of manual data extraction. The scope of a breach involving AI access is likely to be larger than the scope of a breach involving equivalent access to the underlying data without AI. This matters for the scope assessment, for regulatory notification thresholds, and for the volume of affected records. AI as the source. Misconfigurations in AI systems — incorrectly permissioned data access, insecure output handling, improperly sandboxed tool use — can themselves cause data exposure without any external attacker. An AI system that surfaces information it should not have had access to in response to a user query, or that exposes data through an incorrectly configured output channel, has caused a data exposure incident even in the absence of a security breach. These incidents are less dramatic than external attacks but potentially more common. And they are harder to detect because the behavior looks like normal AI system use rather than an anomalous external access pattern. Where the standard response plan fails Scope assessment. The standard scope assessment for a data breach identifies which records were accessed. When an AI system was involved, the relevant question is not which records were accessed but which outputs were generated — what did the AI synthesize from the records it could reach, and what information was contained in those outputs? This is a harder problem. AI outputs are not automatically logged in the way that database queries are. The organization may not have complete records of what the AI system produced during the breach window. Reconstructing the scope requires different methods than a traditional database access log analysis. Cause determination. Traditional breaches have identifiable technical causes: a vulnerability, a misconfigured permission, a phishing attack. AI incidents often have more diffuse causes — a combination of permissive access, insufficient output monitoring, and system behavior that was technically within parameters but produced an unintended result. Root cause analysis for AI incidents requires understanding of the AI system's architecture and behavior that most incident response teams do not have. Regulatory notification. Data breach notification requirements typically specify notification timelines and the content of notifications. When an AI system is involved, determining what categories of personal data were exposed requires understanding what the AI could access and what it may have surfected — an analysis that takes longer and requires more specialized input than a direct database access log review. Communication with affected parties. Breach notification communications are standardized around the concept of "your data was accessed by an unauthorized party." When an AI system was the mechanism, the communication needs to explain something more complex: what the AI system could access, what it may have produced, and why that creates risk for the affected individual. Most breach communication templates are not equipped for this. What the CFO and CIO need to prepare now Update the incident response plan. The plan needs to include AI-specific scenarios: prompt injection, AI-amplified credential breach, misconfiguration-driven data exposure. Each scenario should have a defined response team (which needs to include AI system expertise), assessment methodology, and escalation path. Establish AI audit logging requirements. If the organization does not have comprehensive logging of AI system queries and outputs, it cannot conduct a complete scope assessment for an AI-involved incident. The logging requirement needs to be part of AI system deployment standards, not something added after an incident. Define who owns AI incidents. Traditional breach response has clear ownership — typically the CISO and legal team with CFO involvement for material incidents. AI incidents may involve technical characteristics the CISO team does not have expertise in. Define who the AI-specific escalation path involves and ensure that person or team is part of incident response planning. Test the plan. Incident response plans for traditional breaches are tested through tabletop exercises. AI-specific scenarios should be part of the tabletop exercise inventory. The scenario of an AI system producing outputs it should not have, or being used as a vector by an attacker, is sufficiently different from traditional scenarios to warrant explicit testing. Understand regulatory notification requirements. Check whether the data protection officer's understanding of notification thresholds and timelines accounts for AI-involved incidents. In particular: the scope determination for an AI breach may take longer than for a traditional breach, and the notification timeline starts from discovery of the breach, not from completion of scope determination. What to take from thisUpdate the incident response plan to include AI-specific scenarios before an incident occurs. The scenarios are different enough from traditional breaches to require explicit planning. Require comprehensive logging of AI system queries and outputs as a deployment standard. Without it, scope assessment for an AI-involved incident is incomplete. Define AI-specific escalation paths within the incident response structure. The expertise required to assess an AI incident is different from traditional breach response expertise. Test AI breach scenarios in tabletop exercises. The behavior of an AI system during and after an attack is counterintuitive enough to warrant practice. The scope of an AI-amplified breach is likely larger than an equivalent breach without AI involvement. Build this into the material incident threshold assessment.The organizations that handle AI-involved incidents well are not the ones that were lucky enough to avoid them. They are the ones that updated their preparedness before the first incident, so that when it happened — and it will happen — the response was organized rather than improvised.

Read full article
The Internal Data Access Problem That AI Makes Suddenly Visible

The Internal Data Access Problem That AI Makes Suddenly Visible

Access controls in most organizations work on a document-by-document basis. You have permission to read a file or you do not. The logic has been sufficient for most purposes because humans navigate information deliberately — they go looking for specific things and find what they have access to. AI tools have broken that model without anyone changing any permissions. When an AI system with broad read access is asked a question, it does not navigate to a specific document. It queries across everything it can reach, synthesizes what is relevant, and produces an answer. The access controls determine what the system can read. They do not determine what combinations it can surface, what inferences it can draw, or what aggregated view of the organization's data it can present to the user. The result is a category of access control failure that most organizations have not addressed, because the access controls themselves are technically correct — and still inadequate. The gap between technical access and intended visibility The cleanest way to describe the problem: in most organizations, there is a meaningful difference between what an employee technically has access to and what they were intended to be able to see. This gap exists because access management is messy in practice. Permissions accumulate over time as people join projects, take on new roles, and inherit access from reorganizations. Revocation processes lag behind changes. Distribution lists include people who should have rotated off. Shared drives created for one purpose get used for another. The intended access model and the actual permissions diverge, and in normal day-to-day work the gap is largely invisible because people go looking for things they need rather than systematically browsing everything they can reach. AI tools systematically browse everything they can reach. That is their function. An employee asking an AI assistant "what do we know about the performance review process for the engineering team" may receive an answer drawn from documents they technically have access to but were never intended to be the audience for — HR process documentation, individual feedback templates, comparative data that lives in a folder from an organizational design project two years ago that nobody cleaned up. The employee has not circumvented any security control. But they have seen something the access model was not designed to permit. The categories where this matters most HR and compensation data. Salary information, performance ratings, disciplinary records, and individual feedback exist throughout organizations in documents with permissions that were set for a specific purpose and have often drifted since. AI systems connected to broad document repositories will find this material and surface it in response to queries that touch on it. Legal and privileged material. Legal advice, litigation strategy, settlement terms, and attorney-client communications often exist in places that technically-authorized users can access for one purpose but should not be able to aggregate for another. The privilege protection may be legally intact — the employee can read the document — but the ability to synthesize across years of legal communications is a different kind of access. Financial data beyond role scope. Budget holders can typically access their own budget data. AI systems may surface aggregate financial data by drawing on individual documents each of which was appropriately accessible, producing a consolidated view that nobody intended to give the employee. Client and partner confidential information. Client files shared within engagement teams are accessible to all team members for legitimate work purposes. An AI system that can search across all engagement files simultaneously may surface patterns about client relationships, deal economics, or strategic situations that no single team member was supposed to see in aggregate. Why the standard response does not work The first response most organizations reach for is tightening access controls. If AI is exposing the problem, fix the permissions. This is not wrong, but it is not sufficient. The problem has two parts that require different responses. The first part is genuine permission drift that should be corrected regardless of AI. Employees who have retained access to systems and documents they no longer need it for should have that access revoked. This is an overdue access hygiene exercise, and AI deployment is a reasonable forcing function for doing it. The second part is structurally different. Even with clean, intentional permissions, an employee with access to many documents across an organization will technically have access to combinations of data that, when synthesized by an AI, reveal more than the permission model was designed to permit. You cannot solve this purely by tightening access, because the individual access grants may all be correct. The solution to the second part requires building constraints into the AI system itself: what categories of data it can include in synthesis across user queries, what aggregation rules apply, and what escalation or approval processes apply to queries that touch the highest-sensitivity categories. Building the right architecture Three things need to happen in parallel, not sequentially. Access control remediation. Run an access review scoped to the data sources the AI system will connect to. Specifically look for: permissions that predate current roles, broad read access granted for historical projects that is no longer needed, distribution list membership that has not been reviewed in over a year. This will not solve the problem completely, but it reduces the surface area. AI-specific access boundaries. Define, at the AI system configuration level, what categories of data the system can use for synthesis in response to user queries. HR data, compensation data, legal documents, and individual performance information may be categories where even technically authorized access should not be available to the AI synthesis function. These boundaries need to be implemented as technical constraints in the AI system, not just as policy guidance. Query monitoring and anomaly detection. The AI system's query logs are, for the first time, making the access control problem visible. An employee who systematically queries for compensation data across a broad population, or who extracts patterns from legal files, shows up in the query logs in ways they would not show up in document access logs. This monitoring capability is new and should be used. What the CIO needs to drive The access control gap in AI deployments is fundamentally a CIO problem, not an AI team problem. The AI team can build a capable system. The CIO needs to ensure that the system's access to organizational data is deliberately configured rather than broadly permissive by default. Broadly permissive by default is the path of least resistance. It makes the AI system more capable and easier to demonstrate. It also creates the access control failures described above, and the first incident involving inadvertent disclosure of HR or financial data through an AI tool is going to be a painful conversation. The access architecture needs to be designed before the AI system goes live. The conversation about what categories of data the system should not be able to synthesize — even if individual documents in those categories are technically accessible — needs to happen with legal, HR leadership, and the CFO, not just the AI team. What to take from thisTechnical access controls determine what an AI system can read. They do not determine what it will synthesize or surface. The gap between these is where the access control problem lives. Run an access control remediation exercise scoped to the AI system's data access before deployment. Clean up permission drift even if the AI deployment were not happening — AI just makes the urgency visible. Build AI-specific access boundaries into the system configuration. Some data categories should not be available for AI synthesis even if individual documents within them are technically accessible. Use AI query logs as an access monitoring tool. The visibility into what the system is being asked to surface is new and valuable. The CIO needs to own the access architecture decision, not delegate it to the AI team. The decisions about what data categories the AI should not aggregate require organizational input that the AI team is not positioned to provide alone.

Read full article
How AI Vendors Use Your Data: Contract Versus Reality

How AI Vendors Use Your Data: Contract Versus Reality

I have read a lot of AI vendor contracts in the past few years. Not because contract review is interesting in itself, but because the gap between what vendors say in sales conversations and what their agreements actually commit to has consequences. Organizations that do not close that gap before signing end up discovering what the contract actually says at the worst possible time. The general shape of an AI vendor data agreement is worth understanding at the executive level — not because the CFO or CIO needs to redline individual clauses, but because the strategic choices about which vendors to use and under what conditions flow directly from what those agreements permit and exclude. Here is what I see consistently. The default terms favor the vendor This should not be surprising. The default data processing terms in any commercial agreement are written to minimize the vendor's liability and maximize their operational flexibility. AI vendor agreements are no different, and in some respects they are more aggressively drafted than traditional software agreements because the stakes around data use are higher and the regulatory landscape is still evolving. The standard structure of a consumer or early-stage enterprise AI agreement typically includes: A broad grant to the vendor to use interaction data for service improvement, model training, and product development purposes, subject to anonymization or aggregation. In practice, what "anonymization" means and how consistently it is applied is rarely specified. Retention periods that are defined by the vendor's operational needs rather than the customer's preferences, often without a customer-initiated deletion right. Liability limitations that cap the vendor's exposure in the event of a data incident at amounts that bear no relationship to the potential harm to the customer — typically limited to fees paid rather than the value of the data or the cost of a breach. Unilateral modification rights that allow the vendor to change the data processing terms with notice, sometimes as short as 30 days, without requiring the customer's affirmative consent. None of these are unusual in commercial software agreements. But when the agreement governs how your organization's strategic data, client information, and proprietary content is handled, they warrant closer attention than a standard SaaS contract. What changes in a properly negotiated enterprise agreement The distinction between a default agreement and a properly negotiated enterprise agreement is significant. When procurement and legal have done their job, the enterprise agreement should include at minimum: Exclusion from training data. A clear, contractually binding commitment that the customer's interaction data will not be used to train or fine-tune the vendor's models. This is the single most important data term and the one that organizations should refuse to proceed without. Data processing agreement compliant with applicable regulations. For any processing of personal data of EU residents, a GDPR-compliant data processing agreement is a legal requirement. Increasingly, other jurisdictions impose similar requirements. This agreement specifies the purposes for which data is processed, the retention periods, the data subject rights the vendor will support, and the security measures in place. Defined retention and deletion terms. The agreement should specify how long the vendor retains interaction data, under what circumstances, and what deletion looks like — with confirmation that deletion is complete and irreversible. Sub-processor disclosure and control. AI platforms often rely on cloud infrastructure, third-party safety tooling, and other sub-processors. The enterprise agreement should disclose who these are and give the customer the ability to object to new sub-processors. Breach notification terms. The timeframe within which the vendor will notify the customer of a security incident affecting customer data. Thirty days is common in default agreements; 72 hours is what most regulatory regimes require you to provide to your own regulators. Make sure the vendor's notification obligation to you is faster than your notification obligation to regulators. The clauses that cause problems later In practice, the clauses that create the most problems are not the ones organizations focus on during negotiation. Aggregated and anonymized data carve-outs. Most agreements carve out "aggregated and anonymized data" from the restrictions on training use, with the rationale that anonymized data cannot be traced back to the customer. The problem is that what counts as "anonymized" is not usually defined with precision, and for certain types of content — queries about niche industries, specialized technical topics, or specific organizational patterns — re-identification is more feasible than the carve-out implies. Operational necessity language. Agreements often include broad permissions for the vendor to process customer data "as necessary to provide and improve the service." The scope of "improve the service" is frequently contested. Make sure this language is defined, not left open. Right to audit provisions. The ability to verify that the vendor is actually complying with the data processing commitments they have made. Many agreements include an audit right that is functionally unusable — limited to once per year, requiring 90 days notice, subject to the vendor's approval of the auditor. An audit right with those conditions provides limited practical assurance. Termination data handling. What happens to your data when the contract ends. How long does the vendor retain it after termination? What format is it returned in? Is deletion from backup systems addressed? Organizations that have ended vendor relationships often discover that "deletion" in practice means deletion from active systems, with indefinite retention in backup infrastructure. The sales conversation versus the signed contract The gap I see most often is between what the sales team communicates during the evaluation — "we never use your data for training," "your data is completely private," "you retain full ownership of everything" — and what the signed agreement actually commits to. This is not always deliberate misrepresentation. Sales teams are not contract lawyers, and they often communicate what they believe to be true without knowing the precise legal scope of the commitments they are describing. The problem is that verbal assurances do not create contractual obligations. What matters is what the signed agreement says. The practical implication: have the conversation about data handling before the procurement decision, but validate every assurance by finding the corresponding contractual language. If the vendor says they do not train on customer data, ask them to point to the specific clause that says so. If the clause does not exist, or if it is qualified in ways that limit its practical scope, that is important information. What the CFO should be looking at The CFO's lens on AI vendor data terms is different from the CIO's. Beyond the data handling questions, the financial and liability exposure matters. Liability caps that are set at fees paid rather than harm caused mean that in the event of a serious data incident, the vendor's contractual exposure is often a fraction of the cost the organization incurs — in regulatory fines, breach notification costs, customer notification, and reputational damage. This does not mean the vendor relationship is unworkable, but it does mean the organization is bearing most of the downside risk and should price that accordingly. Insurance coverage. Some AI vendor incidents may fall into gaps between the organization's existing cyber insurance policy and the vendor's coverage. This is worth reviewing explicitly before the program goes live. Renewal and price terms. AI vendor agreements increasingly include significant pricing flexibility — unilateral price changes, usage-based components that scale in ways that are hard to predict, and renewal terms that are less favorable than the initial agreement. Understanding the financial exposure over a three-to-five year horizon matters for the investment case. What to take from thisDefault AI vendor data terms are written for the vendor's benefit. Do not assume they protect customer interests without reviewing them. The training exclusion is non-negotiable for any enterprise deployment handling sensitive data. Get it as a contractual commitment, not a sales assurance. Aggregated and anonymized data carve-outs are often broader than they appear. Define what anonymization means in the specific context of your data. Audit rights that are functionally unusable provide no real assurance. Push for meaningful audit provisions. Verify every data handling assurance the sales team makes by finding the contractual language that supports it. If it is not in the contract, it is not a commitment.The organizations that manage AI vendor relationships well are not the ones with the longest or most restrictive agreements. They are the ones that understood what they were agreeing to before they signed, addressed the material gaps, and built a vendor relationship on actual commitments rather than assumed ones.

Read full article
The AI Tools Your Employees Are Using With Your Data

The AI Tools Your Employees Are Using With Your Data

The standard framing for AI governance starts with the question of which tools to approve. That is the wrong starting point. The better question is: which tools are already in use, with what data, under what terms? By the time most organizations start building an AI governance framework, their employees have already made a fairly coherent set of tool choices. They have picked the tools that solve their immediate problems. They have not, in most cases, read the privacy policies or data processing terms. And because nobody told them not to, they have been using company data freely. A CIO who wants to get ahead of this — or who wants to manage it after the fact — needs a clear picture of the tool landscape and an honest assessment of where the data risk actually concentrates. Not all unsanctioned AI tools carry the same risk. Understanding the difference is where governance work should start. The tool categories and what they mean for data General-purpose AI assistants This is the highest-volume category. Consumer versions of large language model interfaces are used daily by employees across functions — drafting communications, summarizing documents, answering domain-specific questions, structuring thinking. The use is frequent, the content fed in is varied, and the data handling terms depend entirely on whether the employee is using a consumer or enterprise account. The specific risk here: consumer-tier accounts with default settings often permit the vendor to use interaction data for product improvement. The same vendor's enterprise tier typically does not. Most organizations have no visibility into whether employees using these tools are on a consumer or enterprise tier, and many are on a consumer tier simply because it was free and faster to start. Productivity AI features in existing software Word processing, spreadsheets, presentation tools, email clients, and project management platforms increasingly include AI features — often activated via a premium license or a setting employees can enable without IT involvement. The risk here is different from standalone AI tools: because these features exist inside software the organization already uses, they often fly under the radar of any AI tool review. The data handling terms for AI features embedded in existing software are usually governed by the same agreement covering the base product, but with additional clauses for the AI component that many organizations have not reviewed since they were added. These clauses deserve explicit attention. Specialist function tools Legal AI tools, sales intelligence platforms, HR tools, finance automation assistants, coding assistants, market research tools — these are purpose-built AI products targeting specific professional functions. They tend to be adopted department by department, often through a free trial that converts to a team subscription without going through central IT. The data risk with specialist tools is often higher than with general-purpose ones, for a specific reason: the content fed into specialist tools tends to be more consistently sensitive. Legal teams feed contracts. Finance teams feed financial models. Sales teams feed client data and deal structures. The tool is designed for that content, which means employees use it confidently and at volume. AI-powered integrations and automation platforms Workflow automation tools, AI connectors between SaaS platforms, and integration layers that use AI for data transformation or decision-making sit in a category that CIOs are least likely to have visibility into. These tools often operate in the background — processing data as part of an automated flow rather than through a direct user interaction — and their data handling terms are buried inside integration documentation that nobody reads. The risk with automation platforms is not necessarily higher than with interactive tools, but the visibility is lower. When a human pastes text into an AI tool, there is at least a moment of conscious choice. When an automated workflow passes data through an AI component as part of processing, there is no such moment. The risk factors that actually matter When assessing the data risk of any specific tool category, there are four factors that determine how much it matters. What data flows through it. The highest risk is where the most sensitive data concentrates: client information, financial projections, legal material, personal data. This varies by tool and by how a specific team uses it. What tier the organization is on. Enterprise agreements typically include data processing terms, exclusions from training use, and deletion rights that consumer tiers do not. A tool is not inherently high-risk or low-risk — the tier and the agreement terms are what determine the actual data handling. Whether a data processing agreement exists. For any tool processing personal data of EU residents, a data processing agreement is a legal requirement under data protection regulation, not a nice-to-have. Many organizations are operating without these agreements in place for tools their employees use every day. How much volume is flowing through it. A low-use tool with poor data terms is a lower priority than a high-use tool with poor data terms. Volume matters. The tools employees reach for first, most often, at highest volume are where the exposure is concentrated. What a CIO needs to do before writing a policy Policies written without a clear picture of the current state tend to be wrong in two ways: too restrictive in areas where the risk is manageable, and silent on areas where the risk is real. Getting the picture right first makes the policy more useful. That means running a discovery exercise that goes beyond the IT procurement system. Talk to department heads about what their teams use. Survey employees. Analyze network traffic for connections to known AI tool endpoints. The goal is a realistic list of tools in active use, categorized by function and frequency. For each tool, determine what tier the organization is on — enterprise or consumer — and whether a data processing agreement exists. This is the most important variable in understanding the actual data handling exposure. From there, prioritize remediation by volume and sensitivity. The tools that process the highest volume of the most sensitive data under the least favorable terms are the first order of business. That might mean migrating employees from a consumer tier to an enterprise tier of the same tool. It might mean negotiating a data processing agreement with a vendor. It might mean replacing a tool with an approved alternative. The classification that comes out of this exercise — which tools are approved at which tier for which data types — is what the policy should be based on. Policies that precede this exercise tend to produce compliance theater rather than actual risk reduction. The conversation with department heads This is where the process usually gets uncomfortable. When a CIO discovers that a department has been using an unsanctioned AI tool with client data for the past year, the instinct is often to shut it down immediately. That is rarely the right response. Abrupt prohibition creates resistance and drives use underground. It also signals that the governance process is about compliance rather than risk management, which damages the working relationship the CIO needs to make future governance effective. The better approach: treat the discovery as information rather than a violation. Understand what the tool is being used for, what problem it solves, and what the actual data exposure has been. If the tool can be moved to an enterprise tier with appropriate terms, do that quickly. If it needs to be replaced with an approved alternative, make the transition timeline reasonable and the approved alternative usable. Department heads whose teams are using shadow AI tools are not adversaries. They are telling you, through their behavior, what the organization's official tooling is failing to provide. The policy conversation goes much better when it starts from that acknowledgment. What to take from thisMap what tools are in active use before designing any AI tool governance policy. The gap between what IT has approved and what employees are actually using is almost always larger than expected. For each tool, determine whether it is in use on an enterprise or consumer tier. That distinction drives most of the material data handling difference. Check whether data processing agreements exist for tools processing personal data. This is a current legal obligation, not a future aspiration. Prioritize remediation by volume and sensitivity: high-use tools handling sensitive data under weak terms first. Treat departments using unsanctioned tools as providing product feedback. Understand why they chose the tool before deciding how to respond.The CIOs who manage this well are not the ones with the strictest policies. They are the ones who ran the discovery work, understood what was actually happening, and built governance around the real picture rather than the one they assumed existed.

Read full article