TL;DR: Russia's Social Design Agency is building fake Wikipedia-style sites, think tanks, and media outlets at scale to contaminate the training data and retrieval pipelines of AI chatbots and search engines. Bloomberg reported the leaked files on June 23, 2026. For small teams, the risk is indirect: you are trusting AI tools whose training data provenance you cannot inspect. Six vendor questions can shrink that risk.
On June 23, 2026, Bloomberg published an investigation based on 73 leaked files from Russia's Social Design Agency (SDA), a Moscow-based organization sanctioned by the United States, the United Kingdom, and the European Union. The files, covering activity from May 2023 through April 2026, revealed an operation that goes beyond traditional social media disinformation. The SDA is not just creating fake X accounts or Reddit bots. It is building the reference infrastructure that AI systems trust.
The operation is called Project 2026. And if you rely on AI tools to research regulatory questions, analyze vendor contracts, draft policy documents, or answer compliance questions, the details of this leak have a direct bearing on how much you should trust the outputs those tools produce.
What Project 2026 Actually Does
Project 2026's core strategy is to contaminate the information layer upstream of AI retrieval. Social media posts are noisy, flagged quickly, and removed. Reference pages, encyclopedias, think tank reports, and media coverage are different. They get indexed deeply, persist for years, and are treated as authoritative by both search engines and AI training pipelines.
The SDA's approach involves three components:
Fake encyclopedia platforms. The leaked files describe building Wikipedia-style reference sites in multiple languages, designed to rank in search results and get scraped by AI training pipelines. These are not crude spam sites. The Germany-focused operation had already produced more than 200,000 pages of coordinated content as of the report date.
Fake think tanks and media outlets. The program creates organizations that look like legitimate policy institutes or regional news services. These publish "research" and "analysis" that supports Kremlin-aligned narratives on AI governance, international security, and geopolitics. Because they present as institutions rather than obvious propaganda, AI retrieval systems treat their content as credible source material.
Coordinated geographic operations. The leaked files include proposals for operations targeting Armenia (a Wikipedia-style platform pushing pro-Kremlin narratives) and Germany (a "self-filling knowledge base" with a pipeline for continuous content generation). The operation is multi-language and designed to persist rather than spike.
The critical point is timing: content seeded into a training corpus is far harder to remove than a social media post. Once an AI model has been trained on manipulated content, the bias is embedded in the model weights. Retraining on cleaned data requires identifying exactly which content was manipulated, which requires knowing what you are looking for, which requires the attacker to have made the manipulation obvious. The SDA's approach is specifically designed to avoid that.
Why This Is a Governance Risk, Not Just a Geopolitics Story
The instinct when reading about Russian disinformation operations is to frame it as a geopolitics problem: something for governments, intelligence agencies, and large tech companies to sort out. That framing is wrong for teams using AI tools day-to-day.
When your team asks an AI assistant "what does the EU AI Act require for high-risk systems?" or "what happened in the FTC's AI enforcement actions this year?", the answer that comes back is a function of that model's training data. If that training data includes manipulated reference pages that misrepresent regulatory requirements, your compliance team is working from contaminated information.
The risk is not that the SDA is targeting your team specifically. The risk is that the AI tools you trust have training pipelines you cannot inspect, and those pipelines are exactly what Project 2026 is designed to exploit.
For compliance and legal teams, the practical impact is:
Regulatory research using AI. If an AI tool trained on SDA-poisoned content incorrectly describes the scope of a regulation, a team relying on that output for compliance decisions is working from corrupted source material without knowing it.
Vendor risk assessments. If an AI tool is used to research the reputation, regulatory standing, or security history of a vendor, contaminated content about that vendor could produce a misleadingly clean result.
Policy drafting. AI-assisted policy documents that draw on manipulated "best practice" guidelines from fake think tanks embed those positions in your governance framework.
None of these are hypotheticals. They are the exact use cases Project 2026 is designed to influence.
The Vendor Provenance Problem
The deeper issue is that most AI vendor contracts do not give you visibility into training data provenance. When you evaluate an AI tool for vendor risk, you look at SOC 2 certification, data retention policies, subprocessor lists, and contractual data use limitations. Almost no vendor will tell you exactly which domains were included in training data, let alone whether those domains have been compromised.
This is not a new problem. What Project 2026 does is make the existing opacity more consequential. An AI tool with opaque training data was a theoretical risk before. It is a known, actively exploited attack surface now.
The technical distinction that matters here is between weight-trained knowledge and Retrieval-Augmented Generation (RAG). In a pure weight-trained model, the model's knowledge is baked in during training. You cannot inspect what sources were used, and you cannot remove specific content without retraining. In a RAG-based system, the model retrieves documents at query time and bases its response on those retrieved documents. You can see which documents it retrieved. You can curate the retrieval corpus. You can build citation-based outputs.
For teams doing research on live regulatory topics, RAG-based tools with transparent source citation are significantly more defensible from a governance perspective than opaque weight-trained responses.
6 Vendor Questions to Add to Your Vendor Risk Policy
The SDA disclosure is a concrete, documented reason to add training data provenance questions to your AI vendor due diligence process. Here is a checklist you can add directly to your vendor vetting workflow:
1. Can the vendor document their primary training data sources? This is a baseline. Major vendors like Anthropic and OpenAI publish some information about training data composition. Smaller vendors often provide nothing. If a vendor cannot describe what their model was trained on at a high level, that is an unresolved risk, not a cleared one.
2. Does the vendor have a process for removing identified disinformation or manipulated content from training data or retrieval corpora? This is the Project 2026-specific question. Ask whether the vendor monitors for newly identified disinformation campaigns and has a mechanism to update their systems in response. "We rely on our data partners to maintain quality" is not an adequate answer.
3. Does the tool use Retrieval-Augmented Generation, and if so, what sources are in the retrieval corpus? If the vendor uses RAG, ask for a description of the retrieval corpus. A RAG corpus limited to your own documents or a curated set of authoritative sources (official government sites, primary legal databases) is far less exposed than one that crawls the open web.
4. Does the vendor publish transparency reports or AI safety documentation covering training data? Vendors with serious data provenance practices will have documentation. Anthropic's model cards, Google's model transparency documentation, and similar resources give you something to assess. Absence of documentation is a governance signal.
5. Can outputs be grounded to cited, inspectable sources? For regulatory research and compliance work specifically, require that the tool can provide source citations for factual claims. An AI tool that asserts "the EU AI Act requires X" without providing a citable source is usable for drafting assistance but not for compliance verification.
6. Does the vendor conduct adversarial red teaming specifically for disinformation or factual manipulation? AI red teaming increasingly covers factual integrity alongside jailbreaks and safety refusals. Ask whether the vendor tests for manipulation scenarios, not just harmful output scenarios.
What to Update in Your Existing Policies
If you already have an AI vendor due diligence checklist, the training data provenance section likely has a placeholder or nothing at all. The Bloomberg disclosure is a concrete trigger to formalize it.
Three updates to make:
Add training data provenance to your AI tool assessment form. The six questions above can be structured as a scored section. Vendors who cannot answer questions 1, 2, and 5 receive an elevated risk rating that requires additional review or sign-off from your legal or security team.
Add a RAG preference for research-grade use cases. For any use case where your team is using AI to research regulatory requirements, vendor standing, or legal facts, your acceptable use policy should specify that the tool must provide source citations and, where possible, use a RAG architecture rather than opaque weight-based knowledge.
Tie AI output verification requirements to the sensitivity of the decision. A draft email is low stakes. A compliance determination or a vendor risk assessment is high stakes. The Project 2026 disclosure is a reason to require secondary verification for any AI-produced research that informs a material decision: either a human checks the primary source, or the AI tool provides an auditable citation trail.
Pair these updates with the AI vendor contract red flags checklist if you have not already reviewed your AI vendor agreements for data use and subprocessor provisions.
The Broader Supply-Chain Context
Project 2026 fits within a larger pattern of AI supply-chain attacks documented in 2025 and 2026. The Miasma npm worm demonstrated that malicious code can embed itself in AI coding tool outputs. Typosquatting and fake app attacks on AI tools have exploited the trust users place in search rankings. Project 2026 adds a new layer: the training data that generates those outputs is itself being systematically manipulated.
Vetting AI tools for fake apps and malware covers the download-time verification problem. Project 2026 is the pre-training-time problem. Both are part of the same supply chain, and neither can be fully controlled by the end user. What you can control is how much weight you place on AI-produced research for sensitive decisions, and whether the tools you use can provide an auditable source trail.
The AI red teaming and security testing requirements guidance covers how to build adversarial testing into your AI governance program. Factual integrity testing, specifically testing whether your AI tools produce verifiable outputs for regulatory questions, is the category Project 2026 makes urgent.
What the SDA Disclosure Does Not Mean
It does not mean AI tools are useless for regulatory research or compliance work. It means the governance controls around those tools need to match the stakes of the decisions they inform. AI-assisted drafting, with human verification of factual claims against primary sources, remains a high-leverage workflow. AI-produced compliance determinations accepted without verification is the risk profile Project 2026 targets.
The leaked files describe a program that is specifically designed to be undetectable in outputs. The SDA is not trying to make AI tools produce obvious propaganda. It is trying to make AI tools produce subtly incorrect information about policy, governance, and regulatory requirements in a way that looks authoritative. The mitigation is not to stop using AI tools. It is to require source citation for factual claims, use RAG-based tools where possible for research, and verify primary sources for any decision that matters.
Related Reading
- Vetting AI tools: how to avoid fake apps and malware in 2026
- AI vendor due diligence checklist 2026
- AI red teaming and security testing requirements 2026
- MCP server security governance checklist 2026
- TypeScript AI agent security incident response playbook
- AI vendor contract red flags: 12 clauses that create liability
- GenAI vendor risk assessment framework 2026
- DeepSeek and Chinese AI models: GDPR data transfer risk
- Miasma npm worm: supply chain attack governance
