Omar Mustaan

06 Mar, 2026
- Data Engineering

The Data Quality Problem Nobody Puts in the Deck

Early in a data program at a large insurer, I asked the head of the business data office what percentage of their claims data they considered clean. She said, without hesitation, about 80 percent. When we actually looked — ran completeness checks, consistency validations, temporal analysis, cross-referenced against the systems of record — the number was closer to 40 percent. And that was by a generous definition of "clean." The 40-point gap was not the result of negligence. It was the result of the way enterprise data accumulates: through system migrations that didn't fully reconcile historical records, through form field changes that made old values semantically incompatible with new ones, through operational shortcuts that were rational at the time and invisible until someone tried to use the data systematically. This pattern — a significant gap between what the business believes about its data and what the data actually contains — is present in virtually every large enterprise I've worked with. What changes is how much it matters. In most operational contexts, it doesn't matter much. In AI programs, it matters enormously. Why the gap doesn't surface until you try to train Data discovery processes reveal the problems that can be found by looking at data directly: missing values, obvious format inconsistencies, clearly duplicated records. What they don't reveal are the problems that only become visible when you try to do something with the data. Semantic inconsistency is one example. A "claim status" field that takes values of "open," "pending," "in review," and "active" might look fine in discovery. The problem emerges when you try to build a model that predicts claim duration and discover that "pending" meant two different things before and after a system migration five years ago. The model learns from the historical pattern and produces predictions that are systematically off for a segment of claims because the label meant something different during the training window. Temporal invalidity is another. Features constructed from historical data often embed assumptions about time that are violated in ways that aren't visible until you start building features. A "days since last contact" feature that looks like a useful signal turns out to encode data entry behavior rather than customer behavior — the field was populated differently in different branches, and the differences correlate with branch-level outcomes rather than customer-level ones. These problems don't show up in a data profile. They show up in model validation, in production performance anomalies, and in the kinds of questions that domain experts ask when they look at model outputs that seem technically sound but operationally wrong. The four failure patterns Completeness gaps. Missing data is the most visible quality problem and usually the best understood. But completeness is less binary than it appears. A field that's 95% complete might have its 5% missingness concentrated in the segment of the data the model most needs to reason about — a specific customer segment, a specific time period, a specific geography. Aggregate completeness metrics hide distributional missingness that can create systematic model errors invisible until production. Consistency failures. Data that means different things in different records, or that's been encoded differently across systems, is the failure pattern that's hardest to detect and most dangerous for model training. Consistency failures are common at integration points — where data from one system is loaded into another — and at migration boundaries, where historical records under an old schema are mapped to fields in a new schema. The mapping logic that seemed sensible at migration time often introduces subtle distortions that aren't documented and don't announce themselves. Temporal drift. The relationship between data and the world changes over time. Customer behavior changes, market conditions change, business rules change. A model trained on data from three years ago has learned from a world that no longer exists in important ways. This isn't a data quality problem in the traditional sense — the data accurately reflects what was true at the time — but it creates a model that doesn't reflect current reality. Temporal drift is the most common reason AI models underperform in production relative to testing, and it's consistently underweighted in most data quality assessments. Labeling errors. For supervised learning problems, the quality of the labels — the ground truth the model is learning from — determines a ceiling on model quality that no algorithm can overcome. Label quality is frequently taken for granted in enterprise AI programs because the labels come from an existing operational system that the business trusts. But operational labels are a product of the processes that generated them, and those processes have their own inconsistencies. Claims classified as fraudulent by one review team using one set of criteria, then reclassified by another team using updated criteria, produce a label set that encodes inconsistency as signal. The model learns from that inconsistency and reproduces it at scale. How it kills AI ROI The mechanism by which data quality degrades AI ROI isn't usually a catastrophic failure. It's a gradual tax on every part of the program. Model performance caps are lower than they should be, which means the business case is harder to achieve. The team spends more time on data remediation than on model development, which means the delivery timeline extends. Retraining cycles are more expensive because the data pipeline that feeds them is brittle, which means the operational cost is higher than projected. And when the business starts to see outputs that seem wrong — where the model disagrees with what an experienced practitioner would say — trust erodes in ways that are very difficult to reverse. The cumulative effect is hard to quantify precisely, but a program that expected an eighteen-month path to production value and took thirty months instead, with model performance fifteen points below the initial projection, is not unusual when data quality problems were underassessed at the start. The feature engineering temptation The engineering response to data quality problems is usually feature engineering workarounds: bridge tables, deduplication logic, reference data lookups, semantic normalization applied in the feature construction layer. These work. I've used them. But they're a form of debt that compounds. A feature pipeline that applies complex normalization to reconcile inconsistent reference datasets has to be maintained for as long as the model runs. When the underlying data changes — and it will — the workaround may silently break, degrading model performance without triggering an error. And every new model that uses the same data inherits the same problem independently, solving it in its own way, creating a portfolio of different workarounds for the same underlying issue. Feature engineering can bridge a data quality gap temporarily while the underlying issue is being addressed. It's not a substitute for addressing the underlying issue. The difference matters when you're building the fourth model on the same data foundation that the first three models already patched around. What a real data quality assessment looks like The most useful question to ask before starting an AI program is not "do we have the data?" Almost every enterprise has data. The question is "is the data we have sufficient to support the model we need to build?" A data quality assessment designed around that question covers: the completeness and consistency of each field the model will use, the temporal validity of the training window, the label quality for the target variable, and the regulatory and compliance status of the data being used for training. For a focused use case, this takes two to four weeks. It produces a realistic view of what remediation work is needed before model training can begin, what workarounds are viable in the short term, and where the data gaps are severe enough to require a different use case or a longer pre-program phase. That view has a cost — it may reveal that the program timeline needs to move right, or that a use case needs to change. But it's a cost paid once, upfront, with full information. The alternative is paying a larger cost spread across months of rework, performance shortfalls, and stakeholder trust that's harder to rebuild than it was to lose. The data quality problem almost always goes into the deck eventually. The question is whether it goes in at the start, when there's still time to do something about it, or at month fourteen, when everyone is looking for someone to blame.

Read full article

20 Feb, 2026
- AI Strategy

AI Risk Is Not IT Risk: Why Your Existing Frameworks Are the Wrong Starting Point

When a board's risk committee receives an AI risk update, it typically looks like an IT risk update with a new heading. Threat categories, control owners, residual risk ratings, escalation thresholds. The framework is familiar because it was borrowed from cybersecurity and data privacy, the last two major technology risk domains the business had to absorb. The problem is that AI risk is structurally different from IT risk. Not more complex in a general sense — different in kind. The threat model is different, the failure modes are different, and the controls that work for one don't work for the other. Running AI risk through an IT risk framework is like using a fire suppression system to manage flood risk. Both are disaster scenarios. The equipment doesn't transfer. Most organizations won't discover this until something goes wrong. At that point, the questions — who was monitoring this, what controls were in place, who was accountable — will point back to a governance structure that was never designed for the problem it was supposed to prevent. Why IT risk frameworks break on AI IT risk is fundamentally binary. A system is breached or it isn't. Data is exfiltrated or it isn't. A control is working or it has failed. The threat model assumes a boundary — a perimeter — and the job is to defend it. When the perimeter is breached, you know. There's an incident. Systems go down, alerts fire, logs show the intrusion. AI risk doesn't work that way. The most dangerous AI failure modes are probabilistic and continuous. A model doesn't fail — it drifts. Its performance degrades incrementally as the world changes and the training data becomes less representative. There's no incident, no alert, no obvious moment when things stopped working. The model continues to produce outputs that look correct. Business processes continue to run on those outputs. Trust accumulates in a system that is quietly becoming less reliable. This is the silent failure mode that IT risk frameworks have no vocabulary for. A cyberattack is visible by design — the attacker wants access. A degrading model is invisible by design — there's nothing malicious, just a slow divergence between what the model learned and what the world currently looks like. The second structural difference is the threat surface itself. IT risk focuses on external threat actors: attackers, phishing campaigns, vulnerability exploitation, insider threats. AI risk introduces an entirely different threat surface: data poisoning during training, adversarial inputs designed to manipulate model outputs, emergent behaviors that weren't anticipated in testing, and feedback loops where a model's outputs influence the data it will be trained on next. These are not threats a penetration test catches. They're not threats a SIEM detects. They require a different monitoring approach and a different set of controls. The control gap IT risk controls are designed around access, authentication, encryption, patching, and incident response. These are the right controls for IT risk. Most of them are irrelevant to AI risk or address a small subset of it. The controls that actually matter for AI risk are: Model monitoring. Tracking model performance over time against defined thresholds — not just system availability, but whether the model's outputs remain accurate and calibrated. This requires ground truth data, a measurement cadence, and someone accountable for reviewing the results. Data provenance and lineage. Understanding where training data came from, what transformations it went through, and whether those sources have changed. A model trained on data that has since changed in composition is a model with an unknown risk profile. Input validation. Monitoring what the model is being asked to score in production, and detecting shifts in the input distribution that may indicate the model is being asked to operate outside its training domain. This is not the same as network traffic monitoring. Explainability requirements. For consequential decisions — credit, insurance, hiring, medical — the ability to explain why the model produced a specific output for a specific input is both a regulatory requirement in many jurisdictions and a basic operational requirement for incident response. Decommissioning criteria. A defined threshold at which a model is pulled from production, not left running after it has degraded past the point of reliability. Most IT governance has no equivalent because software doesn't degrade the way models do. None of these appear in a standard IT risk control library. They require a purpose-built AI risk framework or significant extension of existing frameworks — not a mapping exercise that tries to force AI risk into existing categories. The accountability mismatch IT risk has clear ownership because IT risk maps to IT infrastructure. The CISO owns the security controls. The CTO owns the infrastructure. Accountability lines are established. AI risk doesn't map to the same functions. The data science team that built the model may not have the operational accountability for what it does in production. The business unit that requested the model may not have the technical literacy to understand the risk it carries. The IT function that runs the infrastructure may have no visibility into whether the model is performing correctly. The accountability gap in AI risk is not a people problem — it's a design problem. The organization hasn't defined who owns AI risk at the model level, and existing governance structures don't force that definition because they weren't built with AI in mind. Good AI risk governance requires a named risk owner for each production model: someone accountable for monitoring its performance, escalating anomalies, and making the decommissioning call when thresholds are breached. That's a different role from anything in a standard IT risk organization chart. What the board's AI risk conversation needs to cover Most board-level AI risk conversations are about whether the organization has an AI risk policy. That's the wrong question. The right questions are operational: Which AI systems are currently making decisions that affect customers, employees, or revenue — and what is the monitoring status of each? What is the defined performance threshold below which each system would be paused or replaced? When did the board last receive an update on actual model performance, not just program status? What is the process for escalating an AI failure to board level, and has it ever been tested? These are not technology questions. They're governance questions that require technology inputs. The reason they're not being asked is that the board's risk framework doesn't prompt them — because the framework was built for a different threat model. The regulatory layer The EU AI Act introduces a formal risk taxonomy for AI that is worth understanding even for organizations not primarily subject to EU law — because it's the clearest articulation currently available of what AI-specific risk management looks like at a regulatory level. The Act's risk tiers map roughly to the severity and reversibility of consequences: prohibited uses (facial recognition in public spaces, social scoring), high-risk uses (credit, hiring, education, law enforcement support), and lower-risk uses with transparency requirements. High-risk systems require conformity assessments, ongoing monitoring, human oversight provisions, and auditability of training data. Whether or not an organization is directly subject to the Act, those categories are a useful starting framework for internal AI risk classification. The question "would this qualify as high-risk under the EU AI Act?" is a reasonable first filter for deciding which AI systems need the most rigorous governance treatment. Organizations in financial services and healthcare already have sector-specific AI risk guidance from their regulators — the EBA, PRA, FDA, and others have all issued guidance that is more detailed and more prescriptive than general enterprise risk frameworks. These are the relevant starting points for organizations in those sectors, not the enterprise risk framework that was designed before AI was a material concern. Building an AI risk framework from scratch is a significant undertaking. Borrowing one from IT risk is the path of least resistance, and the path most likely to leave the organization exposed when something goes wrong. The frameworks aren't interchangeable, and the gap between them is where most AI governance failures currently live.

Read full article

06 Feb, 2026
- Enterprise AI

The AI Audit Your Board Should Be Asking For (But Probably Isn't)

When organizations commission audits, they tend to know what they're looking for. A financial audit looks for misstatements. A cybersecurity audit looks for vulnerabilities. Both have established methodologies, credentialed practitioners, and a clear output format. No equivalent exists yet for AI — and the absence is starting to matter. Most boards that have approved AI investments can answer some questions about them: what budget was committed, which vendor was selected, whether the program is on schedule. Very few can answer the questions that actually determine whether the organization's AI exposure is understood and managed: which AI systems are currently making decisions that affect customers, employees, or revenue, who is accountable when those decisions are wrong, and would anyone be able to explain a specific bad outcome if asked to by a regulator or a plaintiff's attorney. The audit that answers those questions isn't a technical audit. It's a strategic one — a systematic review of what the organization is actually doing with AI, whether the governance structures in place are real rather than documented, and whether the accountability chains hold under scrutiny. What a strategic AI audit is A strategic AI audit is a decision audit. It asks: what decisions is AI making in this organization, and is the governance around those decisions adequate? This is different from a technical audit, which asks whether AI systems are built correctly. It's different from a compliance audit, which asks whether documentation requirements have been met. And it's different from a security audit, which asks whether AI infrastructure is protected from external threats. The strategic audit asks the governance question: if something goes wrong with an AI-driven decision, does the organization know what happened, does someone own it, and is the board in a position to account for it? In my experience, the answer to at least one of those three questions is "no" in most organizations that haven't specifically designed for it. AI systems accumulate in organizations faster than governance frameworks evolve to cover them. A use case that started as an internal productivity tool is now influencing hiring decisions. A model deployed for one market is being used in another where the regulatory context is different. A vendor has updated an underlying model and the organization's internally built layer is now operating on a different foundation than it was when it was approved. None of these are necessarily failures. All of them are things the board should know about — and typically doesn't. The three questions it needs to answer What AI systems are making consequential decisions, and do we have a complete inventory? Most organizations do not have a comprehensive inventory of their production AI systems. AI proliferates in ways that other technology doesn't — it's embedded in vendor products, built by business units operating outside central governance, and updated by vendors without explicit notification to the client. The first deliverable of any AI audit is an accurate map of what exists. "Consequential" is a meaningful threshold here. Not every AI system making recommendations needs the same governance treatment. An internal tool that suggests email response drafts is different from a model that scores customer loan applications or determines which job candidates advance to interview. The audit should focus governance energy on decisions that affect customers, employees, or material financial outcomes. Who is accountable when an AI-driven decision is wrong? This is the question that most AI governance documentation fails to answer concretely. Organizations have AI ethics policies, responsible AI frameworks, and model risk management guidelines. Very few of them name a specific person who is accountable for a specific model's outputs in production. The audit should resolve this to a named individual for each consequential AI system. Not a team. Not a committee. A person, in a role, with defined responsibilities for performance monitoring, incident escalation, and the decision to pause or decommission the system. Could anyone explain a specific bad outcome if required to? This is the forensics question. If a customer was denied credit by an AI model and files a complaint, can the organization trace the specific inputs that drove the decision, explain why the model weighted those inputs the way it did, and demonstrate that the decision was consistent with the model's approved use case and the organization's stated policies? In many organizations, the honest answer is no. The model exists in production, but the audit trail, explainability layer, and documentation necessary to reconstruct a specific decision either don't exist or aren't maintained in a format accessible to anyone outside the technical team. Why internal audit isn't equipped to run it alone Internal audit functions have the independence and mandate to commission this work. They typically don't have the domain expertise to execute it without specialist support — and that's worth being explicit about rather than papering over. An internal auditor assessing whether AI governance documentation is complete can do that independently. An internal auditor assessing whether the documentation reflects what's actually happening in production models, whether monitoring thresholds are set appropriately for the use case, or whether a model's training data is representative of the population it's scoring — that requires someone with operational AI experience. The practical answer is a co-sourced approach: internal audit drives the process and maintains ownership of findings, specialist external support provides the domain expertise for the technical evaluation components. The independence of the finding sits with internal audit. The technical credibility sits with the specialist. This is how most mature compliance functions handle domains where internal expertise is thin — it's not a novel structure, just one that AI hasn't yet been systematically included in. The business case Boards sometimes resist commissioning audits because the output is uncertain and the cost is visible. The AI audit case is stronger than that framing suggests. Regulatory exposure is real and increasing. The EU AI Act creates conformity assessment requirements for high-risk AI systems. Sector-specific AI guidance from financial regulators, the FDA, and employment regulators creates audit trails that organizations will need to produce. An AI audit conducted proactively is a fraction of the cost of a regulatory examination that finds governance gaps the organization didn't know it had. Operational risk is also material. A model that has been quietly degrading for months is a liability that doesn't appear on anyone's radar until it affects enough decisions to produce visible business consequences — customer complaints, adverse outcomes at scale, regulatory notice. An audit that finds this early is worth more than its cost. The D&O angle is worth raising directly with board members. Directors who approve AI strategies and investments are making decisions they will be held accountable for if something goes wrong at scale. An independent, documented review of whether the AI governance is adequate is meaningful protection. Approving an AI investment without it is a risk that sits with the individual director, not just the organization. Frequency and triggers For organizations with material AI exposure — models in production affecting customers, employees, or revenue at volume — an AI strategic audit should be an annual activity, structured similarly to other assurance reviews. Out-of-cycle triggers worth defining: a significant change to a production AI system or its underlying model; entry into a new market or use case with AI involvement; a regulatory examination or enforcement action involving AI anywhere in the industry; a visible AI failure in a comparable organization that prompts questions about whether a similar pattern exists internally; and any M&A that brings new AI systems into the organization. The audit doesn't need to be comprehensive every year. A rolling program that covers the highest-risk systems annually and lower-risk systems on a longer cycle is a practical approach for large organizations with many AI deployments. What it should never be is one-time. The AI landscape inside an organization changes faster than any other technology domain. A clean finding from two years ago is not evidence of a clean position today.

Read full article

23 Jan, 2026
- AI Strategy

Regulatory Exposure Your Legal Team Hasn't Priced In Yet

Most enterprise legal teams have a mental model for new regulation built from the GDPR experience: wait for the law to come into force, watch the early enforcement actions to understand where the real exposure sits, then document accordingly. Move fast on the documentation, slow on the underlying change. Find out where the lines are before you invest in compliance infrastructure. That model worked reasonably well for GDPR — enforcement was slow, penalties in the early years were manageable, and the documentation-first approach bought time without serious consequences in most cases. It is the wrong model for the EU AI Act. The difference is structural. GDPR primarily required documentation of existing practices and some adjustments to data handling procedures. The EU AI Act, in its high-risk provisions, requires conformity assessment before deployment — not documentation of what you're already doing, but evidence that the system meets requirements before it goes live. Organizations that apply the GDPR mental model will find themselves with AI systems in production that haven't been through the required assessments, with no clean path to retroactive compliance, and with exposure that compounds with every decision the system makes while out of conformity. What the Act actually says The EU AI Act classifies AI systems into four tiers based on the risk of harm their deployment creates. Unacceptable risk systems are prohibited outright: social scoring by governments, real-time biometric identification in public spaces for law enforcement (with narrow exceptions), AI that exploits vulnerable groups, and systems that manipulate behavior through subliminal techniques. These are banned — no compliance path, no exemption. High-risk systems are the category most enterprises need to focus on. These require conformity assessment before deployment, ongoing monitoring in production, mandatory human oversight mechanisms, detailed technical documentation, and registration in an EU database before deployment. The high-risk categories include: AI systems used in hiring and employee management, credit scoring and credit access, insurance underwriting, educational admission and assessment, law enforcement support, migration and asylum processing, administration of justice, critical infrastructure management, and certain medical device software. Limited risk systems — primarily chatbots and AI-generated content — require transparency disclosures: users must be told they're interacting with AI. Minimal risk systems have no mandatory requirements under the Act, though voluntary codes of conduct may apply. The high-risk category is where most enterprise exposure sits, and it's larger than most legal teams initially assume when they scan the definition. Which enterprise use cases are actually high-risk The hiring dimension alone is significant. Any AI system used to sort, screen, rank, or make recommendations about job candidates or existing employees falls into high-risk. That includes resume screening tools, interview analysis software, performance management AI, and automated scheduling or task assignment systems that affect working conditions. Most large enterprises now use at least one tool in this category, often embedded in HR software they didn't build and may not have evaluated from an AI Act perspective. Credit and financial services exposure is equally broad. AI systems used in credit scoring, creditworthiness assessment, and insurance risk pricing are high-risk. This includes systems used by banks, insurers, and any firm offering financial products that involves AI-driven eligibility or pricing decisions. The medical device software provisions catch more organizations than expected. AI software that is a safety component of a medical device, or AI used to make clinical decisions affecting patient care, falls into high-risk — including software used by healthcare providers, not just device manufacturers. The critical infrastructure category covers AI used in the management of roads, railways, airports, water, gas, electricity, and certain digital infrastructure. This is relevant not just for utilities but for logistics companies, transportation operators, and cloud infrastructure providers. Why the GDPR parallel breaks down Under GDPR, the primary obligation is to document how you process personal data and to meet certain data subject rights requirements. The documentation has to be accurate, but the underlying processing can largely continue while you get the documentation in order. Under the EU AI Act, high-risk systems cannot be deployed until they have passed a conformity assessment. The assessment covers: risk management processes, data governance for training data, technical documentation of system design and performance, logging and monitoring capabilities, transparency and instruction requirements, human oversight mechanisms, and accuracy and robustness requirements. This isn't a documentation exercise that follows deployment. It's a pre-deployment gate. An organization that deploys a high-risk AI system without completing conformity assessment is not behind on compliance documentation — it is operating an illegal system. The penalty structure reflects this. Maximum penalties for prohibited systems are €35 million or 7% of global annual turnover. For high-risk system violations, the maximum is €15 million or 3% of global turnover. These are not late-documentation penalties — they're deployment penalties, applying to every period the non-compliant system was in operation. The extraterritorial reach The EU AI Act applies to any provider that places AI systems on the EU market or puts them into service in the EU, and to any deployer that uses AI systems in the EU — regardless of where the provider or deployer is based. A US company selling software that includes an AI hiring tool to European customers is a provider subject to the Act. A UK company using an AI credit scoring system for its European customers after Brexit is a deployer subject to the Act. A global company running an AI performance management system for its European employees is subject to the Act for those deployments. The extraterritorial scope is broader than most non-EU legal teams initially assume, and the relevant analysis is not "are we an EU company" but "are we deploying AI systems that affect people in the EU." The foundation model provisions The EU AI Act includes specific provisions for general-purpose AI models — what the Act calls GPAI models — which are relevant for any organization using large language model APIs from providers like OpenAI, Anthropic, Google, or Meta. GPAI providers have their own compliance obligations. But organizations that build applications on top of GPAI models are deployers of those capabilities, and the Act creates a liability chain. If you build a high-risk application on top of a GPAI model, your conformity assessment needs to address the GPAI component — you can't simply defer to the model provider's compliance documentation as if your deployment decisions are irrelevant. The practical implication: if you're using an LLM API to power a system that falls into a high-risk use case — an AI system that helps make hiring decisions, credit assessments, or medical triage — the fact that you're using a third-party model doesn't eliminate your conformity assessment obligation for the application layer you've built. What to do now vs. what can wait The Act has a phased implementation timeline. Prohibited system provisions are already in force. High-risk system requirements apply progressively based on system type, with most provisions applying from August 2026 onward, and some extended timelines for specific categories. This creates a window — but it's narrower than it appears, because conformity assessment for complex AI systems takes time. Running the assessment, remediating gaps, completing documentation, and registering in the EU AI database is a 6–12 month process for a well-prepared organization. Organizations that start this process in mid-2026 may find themselves past the deadline. What to do now: inventory every AI system used by your European employees or affecting your European customers. Classify each against the Act's risk tiers. For high-risk systems, begin the conformity assessment process. Flag the GPAI component for any application built on LLM APIs. What can wait: the transparency and labeling requirements for limited-risk systems can be addressed in a second wave. The voluntary codes of conduct for minimal-risk systems are not urgent. The documentation maintenance requirements for compliant high-risk systems are ongoing, not front-loaded. The organizations that will be in the most difficult position in 2026 are those that decided to monitor the situation rather than act on it. The monitoring strategy works when compliance is about documentation. It fails when compliance is a deployment gate — and for the EU AI Act's high-risk provisions, that's exactly what it is.

Read full article

09 Jan, 2026
- Enterprise AI

When Your AI Vendor Fails: What Enterprise Continuity Planning Misses

Enterprise business continuity planning has matured significantly over the past decade. Most large organizations have detailed plans covering data center failure, key supplier disruption, network outages, and critical SaaS dependency. The plans are tested, updated annually, and presented to the board as evidence of operational resilience. Almost none of them cover AI vendor failure. This is a gap that is growing faster than organizations are closing it, because AI infrastructure has accumulated as a strategic dependency faster than any previous technology category — and the concentration of that infrastructure in a small number of providers creates a risk profile that most business continuity frameworks weren't designed to address. The concentration reality Most enterprise AI programs now run on infrastructure from three to five large providers. The foundational models — the large language models, the vision models, the embedding models — come from a handful of companies. The infrastructure to fine-tune, serve, and monitor those models runs predominantly on two or three cloud platforms. The tooling layer for MLOps, vector databases, and AI observability has its own concentrations. This isn't a failure of procurement strategy. It reflects the current state of the market: the compute requirements for training large models are only viable at a few companies, and the organizational dependencies compound over time as internal teams build workflows, integrations, and institutional knowledge around specific providers. The result is that a significant portion of enterprise AI capability now has a concentrated dependency structure that looks nothing like the diversified supply chains those same organizations maintain for physical goods or traditional software. The failure scenarios that aren't hypothetical None of the scenarios I'm going to describe are theoretical. All of them have happened to some organization or category of organization in the past few years. Pricing changes. API pricing for foundational models has changed multiple times since commercial availability began. Organizations that built business cases on a specific cost-per-call structure have seen those economics shift materially. At low volumes this is manageable. At production scale, a significant pricing increase is a P&L event that has to be absorbed or responded to — and the response options are limited when the alternative is rebuilding on a different model. API deprecation. Model versions get deprecated. The model version that a production system was built on, fine-tuned on, and evaluated against may be removed from availability on a timeline driven by the provider's product roadmap, not the client's operational needs. This forces either a migration under time pressure or an extended period of running on a deprecated version with no support and potential security exposure. Performance degradation. Foundation model behavior changes with version updates, even when providers describe updates as improvements. A model that performed reliably on a specific task may behave differently after an update that was not communicated as a breaking change. For AI systems that have been through a conformity assessment or regulatory approval process based on specific model behavior, this creates a compliance problem as well as an operational one. Regulatory shutdown. Regulators in multiple jurisdictions have shown willingness to restrict or suspend AI capabilities. This risk is higher for providers with significant regulatory exposure and for capabilities that sit in legally ambiguous territory. Acquisition and pivot. The AI infrastructure market is consolidating. A provider that is an independent company today may be acquired by a larger platform company whose strategic priorities don't include the specific capabilities your organization depends on. Post-acquisition product decisions are made by the acquirer. What business continuity plans currently miss Standard business continuity frameworks are built around the concept of recovery time objective (RTO) and recovery point objective (RPO): how quickly can we restore service, and how much data can we afford to lose? These are the right questions for infrastructure failures — a data center going down, a primary database becoming unavailable. They're the wrong questions for AI vendor risk, because the failure modes are different in character. A data center failure is abrupt and typically temporary. An AI vendor pricing change is gradual and potentially permanent. A model version deprecation has a defined timeline but requires significant engineering work to respond to. A performance shift may not be immediately detectable and may require revalidation of systems that were previously approved. Business continuity planning for AI needs to address a different set of scenarios: what does the organization do if this model version is deprecated in 90 days? What does the organization do if this provider's pricing increases by 40%? What does the organization do if a capability we depend on is restricted by a regulator? These scenarios require capability-based responses, not infrastructure failover. The portability question The honest answer for most organizations is that they are significantly more locked in than they realize. Fine-tuned models, when the fine-tuning has been done on a provider's infrastructure, may not be portable in a form that can be redeployed elsewhere without significant rework. Embeddings and vector indexes built against one model's embedding space are not compatible with a different model's embedding space without recalculation. Prompts engineered for one model's behavior may produce degraded outputs on a different model without re-optimization. The integration layer — the application code, the data pipelines, the orchestration logic — is generally portable. The model-specific components often aren't, and those are frequently where the most time and expertise have been invested. Understanding your actual portability position requires a dependency mapping exercise that most organizations haven't done: for each production AI system, what would it take to migrate to an alternative model, and how long would that take with current team capability? Contract provisions that actually matter Standard enterprise software contracts are not adequate for AI vendor relationships. The provisions that should be in place, and often aren't: SLA definitions for model behavior. Standard SLAs cover uptime and response time. They don't cover model performance consistency. If the provider can change model behavior without notification and without SLA consequence, the client has no contractual protection against the performance degradation scenario. Data portability rights. Any fine-tuning data, output data, or evaluation data held by the provider should be contractually accessible and exportable. This matters both for migration and for regulatory compliance — organizations may need to produce this data in response to a regulator. Deprecation notice periods. Minimum notice periods before model version deprecation should be defined contractually. Ninety days is common in SaaS contracts for feature deprecation — AI model deprecation warrants at least the same, and often longer given migration complexity. Audit rights. The right to audit model behavior, training data provenance (where relevant), and security practices. This matters for regulatory conformity — particularly for organizations subject to the EU AI Act's high-risk system requirements. Successor model provisions. What obligations does the provider have when a model version is deprecated? Is there a migration support commitment? Are there protections against the replacement model failing to meet the performance specifications of the deprecated version? The resilient procurement position No procurement position eliminates AI vendor risk. But the gap between organizations that have thought about this and those that haven't is significant. The elements of a resilient position: Multi-vendor architecture for critical systems. For AI capabilities that support critical business processes, architectural design should account for vendor substitution. This doesn't mean parallel deployment of everything — it means understanding which systems could be migrated within an acceptable timeframe and what would be required to do it. Open-weight model capability as insurance. Maintaining the internal capability to run open-weight models — Llama, Mistral, and similar — creates an option that most organizations are currently not exercising. These models may not match proprietary model performance for all use cases, but for some use cases they're adequate, and the ability to fall back to self-hosted capability removes one category of vendor dependency. Internal capability as a floor. The organization should maintain sufficient internal AI capability that it can evaluate vendor claims, understand what it's using, and make migration decisions based on technical judgment rather than pure vendor dependency. This doesn't require building everything internally. It requires not outsourcing understanding. Documentation for migration. For each production AI system, maintaining documentation of what the system does, what inputs it requires, what performance benchmarks it meets, and what alternatives were considered creates a foundation for migration planning that doesn't require starting from scratch under pressure. The business continuity plan that doesn't include AI vendor failure scenarios is a plan with a known gap. The question is whether the gap is closed before or after the scenario that reveals it.

Read full article

19 Dec, 2025
- AI Strategy

The Liability Question Nobody Is Answering in the Boardroom

Every consequential AI deployment eventually produces a wrong decision. A loan gets denied that should have been approved. A job candidate is filtered out by a biased model. A medical summary contains an error that affects clinical judgment. A fraud detection system flags a legitimate transaction and freezes a customer's account. None of these are failures in the sense of the system breaking down. The system functioned as designed. It produced an output. The output was wrong, and the output had consequences. The question that follows — who is accountable for that wrong decision — is one that most boards and executive teams have not answered before the deployment, and have to scramble to answer after. That sequence is backward. Accountability for AI-driven decisions is something that needs to be designed in, not litigated into existence. And the design work is a governance decision, not a technical one. The legal gap AI systems occupy a strange position in the existing liability framework. They make decisions — consequential ones, at scale — but they aren't legal entities. They can't be sued. They don't have directors or officers. When something goes wrong, liability has to land somewhere in the human and organizational structure around the system. The current legal landscape draws from several established frameworks, none of which maps cleanly onto AI: Product liability — the manufacturer of a defective product is liable for harm it causes. AI systems can be analogized to products, and there's active litigation testing this framework in multiple jurisdictions. The challenge is that AI systems produce outputs that vary based on input, and the boundary between a "defect" and an "intended behavior that produced an unintended outcome" is genuinely ambiguous. Professional liability — practitioners in regulated professions (doctors, lawyers, financial advisors) are liable for negligent advice. AI systems are increasingly used to support or substitute for that professional judgment. The question of whether the liability stays with the professional when AI was involved in the recommendation — and how the professional's duty of care applies to AI-assisted decisions — is being litigated and regulated in parallel. Negligence — failing to exercise reasonable care in deploying or monitoring an AI system that then causes harm. This framework is probably the most applicable to enterprise AI in the near term, and it places significant weight on what processes the deploying organization had in place. Sector-specific regulation — in financial services, healthcare, employment, and other regulated industries, AI decisions may trigger specific regulatory liability that operates independently of common law. The EU AI Liability Directive, which is moving through the European legislative process, aims to make it easier for individuals harmed by AI to access compensation — including through a presumption of causality that shifts the burden of proof in certain cases. The direction of travel in regulation is toward easier liability attribution, not harder. Three scenarios, one question Consider three concrete scenarios and what the liability analysis looks like for each under current frameworks. A denied loan. A bank uses an AI model for credit decisions. A customer is denied a mortgage. The customer believes the denial was based on factors that correlate with protected characteristics. Under ECOA and the EU AI Act's high-risk classification for credit AI, the bank has obligations to explain the decision and to demonstrate that the model doesn't produce discriminatory outcomes. Liability sits with the bank as the deploying organization. The model vendor's liability is limited unless the bank can show the vendor misrepresented the model's properties or failed to disclose known issues. A flawed medical summary. A hospital uses an AI-generated clinical summary tool. A physician relies on a summary that omits a critical finding. The patient experiences harm as a result. Liability analysis here is complex: the physician has a professional duty of care that doesn't disappear because AI was involved; the hospital may have organizational liability for deploying a system without adequate oversight; the AI vendor may have product liability exposure if the system was marketed as clinically reliable. The physician-patient relationship doesn't dissolve because there's an AI in the loop. A biased hiring filter. A company uses an AI screening tool that disadvantages candidates from certain demographic groups. Multiple candidates who should have advanced are screened out. Under employment discrimination law, the company is liable for discriminatory outcomes regardless of whether the discrimination was intentional or was produced by an AI system. "We didn't know the model was biased" is not a defense that has worked in US employment discrimination cases, and the EU AI Act's high-risk classification for hiring AI creates additional conformity assessment obligations. In all three cases, the deploying organization carries significant liability. In none of the three cases does "the AI did it" operate as a meaningful defense. Why "the vendor is responsible" doesn't hold The most common liability assumption I encounter in enterprise boardrooms is that the AI vendor carries the primary liability for wrong decisions. This assumption is wrong in most deployment contexts, and acting on it creates governance gaps that become expensive. Vendor contracts are typically written to limit liability, often to the contract value or a multiple of it. This is standard commercial practice and not specific to AI, but the magnitude of AI-driven decisions can exceed contract liability caps by orders of magnitude. A vendor that is liable up to the annual contract value for a model used in credit decisions that affect thousands of customers has not transferred the real economic exposure. More fundamentally, courts and regulators in most jurisdictions have been clear that the organization deploying the AI system — not the vendor supplying the model — is responsible for the decisions made using it. The deploying organization chose to use the system, chose how to integrate it, chose what human oversight to maintain, and chose to act on its outputs. Those are deployment decisions, and they carry accountability. There are narrow scenarios where vendor liability is real and significant: if the vendor misrepresented the system's capabilities, if the vendor knew of material deficiencies and didn't disclose them, or if the system had a defect that was not discoverable through reasonable testing. These are exceptions, not the general rule. What good organizational liability design looks like Liability for AI decisions can't be eliminated, but it can be managed through organizational design that makes the accountability chain clear, auditable, and defensible. Named accountability for each production AI system. Someone in the organization should be designated as accountable for each AI system's decisions and performance. Not a team — a named individual in a named role. This person is responsible for monitoring performance, escalating anomalies, and making the decision to pause or decommission if performance degrades. Their accountability should be documented, and the documentation should be accessible if a decision is ever challenged. Human-in-the-loop requirements for consequential decisions. For decisions with significant impact on individuals — credit, employment, medical — maintaining a human review stage creates both a check on AI outputs and a clearer accountability structure. The human reviewer is accountable for the final decision in a way that's legally cleaner than pure AI automation. This doesn't require reviewing every decision — it requires designing the process so that consequential decisions involve human judgment at the point of decision. Audit trails for individual decisions. The ability to reconstruct a specific decision — what inputs the model received, what output it produced, what threshold it applied, what human review occurred — is essential for responding to challenges. This is an engineering requirement that has to be designed in before deployment, not retrofitted after a complaint. Documentation of known limitations. AI systems have known failure modes. Documenting these honestly — and documenting what controls are in place for each — creates a record of due diligence that is relevant in negligence analysis. An organization that knew a model performed poorly on a specific demographic and deployed it anyway is in a very different position from one that wasn't aware of the limitation. The board's role The liability question is a board-level question because the accountability structure for AI decisions is a governance design decision, not a management decision. Management designs the systems. The board should satisfy itself that the design is adequate — that accountability is named, that audit trails exist, that human oversight requirements are defined, and that the organization has a clear answer to the question of who is responsible when an AI decision is wrong. This isn't a theoretical exercise. The litigation and regulatory enforcement that will define enterprise AI liability over the next decade is starting now, and the decisions organizations make today about how to structure accountability will determine whether their position in that litigation is defensible or not. The board that has approved an AI investment and cannot answer "who is accountable if this system produces a wrong decision at scale" has a governance gap. Closing it is not a technical task. It's a decision the board needs to make, and it needs to make it before the scenario that tests it.

Read full article