Showing Posts From

Data security

The Internal Data Access Problem That AI Makes Suddenly Visible

The Internal Data Access Problem That AI Makes Suddenly Visible

Access controls in most organizations work on a document-by-document basis. You have permission to read a file or you do not. The logic has been sufficient for most purposes because humans navigate information deliberately — they go looking for specific things and find what they have access to. AI tools have broken that model without anyone changing any permissions. When an AI system with broad read access is asked a question, it does not navigate to a specific document. It queries across everything it can reach, synthesizes what is relevant, and produces an answer. The access controls determine what the system can read. They do not determine what combinations it can surface, what inferences it can draw, or what aggregated view of the organization's data it can present to the user. The result is a category of access control failure that most organizations have not addressed, because the access controls themselves are technically correct — and still inadequate. The gap between technical access and intended visibility The cleanest way to describe the problem: in most organizations, there is a meaningful difference between what an employee technically has access to and what they were intended to be able to see. This gap exists because access management is messy in practice. Permissions accumulate over time as people join projects, take on new roles, and inherit access from reorganizations. Revocation processes lag behind changes. Distribution lists include people who should have rotated off. Shared drives created for one purpose get used for another. The intended access model and the actual permissions diverge, and in normal day-to-day work the gap is largely invisible because people go looking for things they need rather than systematically browsing everything they can reach. AI tools systematically browse everything they can reach. That is their function. An employee asking an AI assistant "what do we know about the performance review process for the engineering team" may receive an answer drawn from documents they technically have access to but were never intended to be the audience for — HR process documentation, individual feedback templates, comparative data that lives in a folder from an organizational design project two years ago that nobody cleaned up. The employee has not circumvented any security control. But they have seen something the access model was not designed to permit. The categories where this matters most HR and compensation data. Salary information, performance ratings, disciplinary records, and individual feedback exist throughout organizations in documents with permissions that were set for a specific purpose and have often drifted since. AI systems connected to broad document repositories will find this material and surface it in response to queries that touch on it. Legal and privileged material. Legal advice, litigation strategy, settlement terms, and attorney-client communications often exist in places that technically-authorized users can access for one purpose but should not be able to aggregate for another. The privilege protection may be legally intact — the employee can read the document — but the ability to synthesize across years of legal communications is a different kind of access. Financial data beyond role scope. Budget holders can typically access their own budget data. AI systems may surface aggregate financial data by drawing on individual documents each of which was appropriately accessible, producing a consolidated view that nobody intended to give the employee. Client and partner confidential information. Client files shared within engagement teams are accessible to all team members for legitimate work purposes. An AI system that can search across all engagement files simultaneously may surface patterns about client relationships, deal economics, or strategic situations that no single team member was supposed to see in aggregate. Why the standard response does not work The first response most organizations reach for is tightening access controls. If AI is exposing the problem, fix the permissions. This is not wrong, but it is not sufficient. The problem has two parts that require different responses. The first part is genuine permission drift that should be corrected regardless of AI. Employees who have retained access to systems and documents they no longer need it for should have that access revoked. This is an overdue access hygiene exercise, and AI deployment is a reasonable forcing function for doing it. The second part is structurally different. Even with clean, intentional permissions, an employee with access to many documents across an organization will technically have access to combinations of data that, when synthesized by an AI, reveal more than the permission model was designed to permit. You cannot solve this purely by tightening access, because the individual access grants may all be correct. The solution to the second part requires building constraints into the AI system itself: what categories of data it can include in synthesis across user queries, what aggregation rules apply, and what escalation or approval processes apply to queries that touch the highest-sensitivity categories. Building the right architecture Three things need to happen in parallel, not sequentially. Access control remediation. Run an access review scoped to the data sources the AI system will connect to. Specifically look for: permissions that predate current roles, broad read access granted for historical projects that is no longer needed, distribution list membership that has not been reviewed in over a year. This will not solve the problem completely, but it reduces the surface area. AI-specific access boundaries. Define, at the AI system configuration level, what categories of data the system can use for synthesis in response to user queries. HR data, compensation data, legal documents, and individual performance information may be categories where even technically authorized access should not be available to the AI synthesis function. These boundaries need to be implemented as technical constraints in the AI system, not just as policy guidance. Query monitoring and anomaly detection. The AI system's query logs are, for the first time, making the access control problem visible. An employee who systematically queries for compensation data across a broad population, or who extracts patterns from legal files, shows up in the query logs in ways they would not show up in document access logs. This monitoring capability is new and should be used. What the CIO needs to drive The access control gap in AI deployments is fundamentally a CIO problem, not an AI team problem. The AI team can build a capable system. The CIO needs to ensure that the system's access to organizational data is deliberately configured rather than broadly permissive by default. Broadly permissive by default is the path of least resistance. It makes the AI system more capable and easier to demonstrate. It also creates the access control failures described above, and the first incident involving inadvertent disclosure of HR or financial data through an AI tool is going to be a painful conversation. The access architecture needs to be designed before the AI system goes live. The conversation about what categories of data the system should not be able to synthesize — even if individual documents in those categories are technically accessible — needs to happen with legal, HR leadership, and the CFO, not just the AI team. What to take from thisTechnical access controls determine what an AI system can read. They do not determine what it will synthesize or surface. The gap between these is where the access control problem lives. Run an access control remediation exercise scoped to the AI system's data access before deployment. Clean up permission drift even if the AI deployment were not happening — AI just makes the urgency visible. Build AI-specific access boundaries into the system configuration. Some data categories should not be available for AI synthesis even if individual documents within them are technically accessible. Use AI query logs as an access monitoring tool. The visibility into what the system is being asked to surface is new and valuable. The CIO needs to own the access architecture decision, not delegate it to the AI team. The decisions about what data categories the AI should not aggregate require organizational input that the AI team is not positioned to provide alone.

Read full article