TL;DR: Most major AI vendors at the business or enterprise tier do not use your data for training. The exceptions matter: consumer-tier ChatGPT and Gemini (opt-out available), GitHub Copilot Individual (opt-out available), and Atlassian, which is changing its policy on August 17, 2026 to train on Confluence page content and Jira issues unless your team actively opts out. If employees use personal accounts for business tasks, data may already be in training pipelines.
In June 2026, a Reddit thread hit the front page of r/artificial with a blunt headline: "So now scraping data without permission is bad for AI training all of sudden?" The sarcasm was pointed at a familiar dynamic: AI companies spent years building their models by scraping the internet, and now those same companies are objecting when others scrape their content. But underneath the irony sat a practical question that thousands of compliance teams are still working out: which of the AI tools your team uses right now is training on your data?
The answer varies by vendor, plan tier, and opt-out status. And as of June 2026, one major change is coming that most teams have not noticed: Atlassian is updating its terms to begin training on Confluence and Jira data starting August 17, 2026, unless customers actively opt out before the deadline.
This guide covers the training data policies for 11 AI tools that are in active use on most small and mid-sized teams, along with what you can do before August 17.
Why This Question Is Harder Than It Looks
The simple version of the question is: "does this AI vendor use my data to train their model?" But in practice, there are at least three different things that could mean:
Training the base model. Using your data to improve the underlying large language model that the product is built on. This is the highest-risk scenario: your confidential data could influence what the model "knows" and potentially surface in responses to other users.
Fine-tuning for the product. Using your usage data to fine-tune the product layer on top of the base model (adjusting tone, task-specific behavior, etc.). Lower risk than base model training, but still involves your data leaving your environment.
Inference-time subprocessing. Sending your prompts and data to a third-party LLM provider to generate a response, without using that data for training. This is what most AI products do and is the least risky form of "vendor sees your data."
Most vendor statements about "not using your data for training" refer specifically to base model training. The same vendor may still send your data to a subprocessor (OpenAI, Google Vertex, AWS Bedrock) for inference. That subprocessing is normal and is disclosed in vendor DPA documents, but it is a separate question from training.
The table below covers base model and fine-tuning training, not subprocessing.
11 Vendor Comparison: Training Data Policies (June 2026)
| Vendor / Plan | Trains on Your Data? | Opt-Out? | Notes |
|---|---|---|---|
| ChatGPT Free/Plus | Yes, by default | Yes (Settings > Data Controls) | Consumer tier; personal account data is used unless opted out |
| ChatGPT Enterprise/Business/Team | No | N/A | Business tiers explicitly excluded from training |
| Claude (all plans) | No | N/A | No opt-out needed; Anthropic does not use Claude conversations for training |
| Gemini (free / Google.com) | Yes, reviewers may read | Yes (Activity controls) | Consumer tier; human review allowed by default |
| Gemini Workspace (Business/Enterprise) | No | N/A | Admin and user data excluded from model training by policy |
| Microsoft Copilot (free) | May be used for improvement | Limited | Consumer tier; check Microsoft Privacy Dashboard |
| Microsoft Copilot Business (M365 Business) | No | N/A | Tenant data not used for foundation model training |
| GitHub Copilot Individual | Yes (code snippets) | Yes (GitHub settings) | Individual tier; opt out in GitHub > Settings > Copilot |
| GitHub Copilot Business/Enterprise | No | N/A | Code and prompts excluded from training by contract |
| Atlassian Rovo/Confluence AI | Yes (from Aug 17 2026) | Yes, until Aug 17 | All plans can opt out of in-app data; metadata opt-out is Enterprise only |
| Notion AI | No | N/A | No opt-out needed; optional LEAP program is strictly opt-in |
The Atlassian August 17 Deadline
This is the most time-sensitive item in the table. Atlassian announced in mid-2026 that starting August 17, 2026, it will use in-app content from Confluence and Jira to train its AI models, including Rovo.
What counts as in-app content: Confluence page titles and page bodies, Jira issue titles, descriptions, and comments, and custom emoji and workflow names. If your team stores business strategy documents, product roadmaps, customer data, or legal notes in Confluence, that content will be used for AI training starting August 17 unless you opt out.
The policy affects all Atlassian Cloud customers, roughly 300,000 organizations globally.
How to opt out:
- Go to Atlassian Administration (admin.atlassian.com)
- Navigate to Security > Data contribution
- Disable in-app data collection before August 17, 2026
Important limitation: only Enterprise plan customers can opt out of metadata collection (story points, SLA metrics, search behavior). For non-Enterprise plans, metadata opt-out is not available.
For teams with EU users: Atlassian sends data to US-based subprocessors including OpenAI (USA), Google Vertex AI, AWS Bedrock, and Databricks for AI processing. This creates data residency considerations for EU-regulated data. If you have EU employees' data in Confluence, the August 17 change without opt-out creates a new cross-border transfer for AI training purposes.
If you have not already reviewed Atlassian's updated data terms and verified your opt-out status, do it before July ends.
The Account Tier Problem
The most common error teams make is assuming that because they have an enterprise contract for their main AI tools, all AI tool usage in the organization is covered. It is not.
Employees routinely use personal-tier accounts for business tasks:
- A developer with a personal GitHub account uses Copilot Individual (trains on code)
- A manager pastes meeting notes into ChatGPT free via their personal email (trains on data unless opted out)
- An analyst uses a personal Google account to run Gemini queries on business data (reviewers may see it)
The risk is concentrated at the boundary between personal accounts and company work. An acceptable use policy that requires employees to use company-provisioned accounts for business AI use cases is the control that addresses this most directly.
For teams that cannot enforce company-provisioned accounts everywhere, a tiered approach works: require enterprise accounts for any use case involving confidential data (customer information, legal documents, financial data, HR records), and allow personal accounts only for low-sensitivity tasks where training exposure is acceptable.
The Open-Source Alternative
For teams with strict data handling requirements, there is a category the table above does not include: self-hosted open-weight models.
Models like Meta Llama 3, Mistral, and the open-source version of DeepSeek can be deployed on your own infrastructure. When you run inference on your own servers, data never leaves your environment. There is no vendor to train on your data because there is no vendor processing it.
The tradeoff is operational overhead: you need to manage hosting, updates, and infrastructure. For regulated industries (healthcare under HIPAA, finance under SOX or GLBA, EU companies processing personal data under GDPR), the control over data residency and training exposure that comes from self-hosting may be worth the cost.
The privacy-first AI APIs guide covers which commercial API providers have verifiable no-training commitments and how to document them for compliance purposes.
How to Update Your AI Acceptable Use Policy
If your organization has an AI acceptable use policy, the training data question should be addressed explicitly. Three provisions to add or update:
Personal account restriction. Specify that employees must use company-provisioned AI accounts for any task involving confidential company information, customer data, or regulated personal data. Define what "confidential" means in your context. Personal accounts are acceptable for learning and low-stakes experimentation.
Training data disclosure requirement. When evaluating new AI tools, require the vendor to disclose their training data policy in writing as part of procurement. Ask specifically: (a) does the vendor train base or fine-tuning models on customer data; (b) is this training the same across all plan tiers; and (c) what is the process for requesting deletion of previously used training data?
Opt-out verification. For any AI tool where training is the default and opt-out is available (Atlassian being the current example), add opt-out verification to your AI tool registration and approval process. Confirm opt-out status annually at minimum.
The Regulatory Dimension
For teams operating under GDPR, training on personal data has additional implications. Using an employee's Jira comments or Confluence pages to train an AI model may constitute processing of personal data for a new purpose (AI training) that was not disclosed at the time of collection.
GDPR Article 5(1)(b), the purpose limitation principle, requires that data collected for one purpose (project management, collaboration) is not used for a different purpose (AI model training) without either a compatible justification or fresh consent. Atlassian's approach of opting customers into training by default with a deadline to opt out is a notification-and-consent model, but teams in EU-regulated industries should verify whether their DPO or legal team considers this adequate under their specific GDPR obligations.
The AI data privacy for small teams guide covers the GDPR analysis for common AI tool use cases, including the purpose limitation question.
Vendor Comparison: What to Ask Before You Sign
If you are in procurement for a new AI tool, ask these questions before signing:
- At which plan tier does the vendor stop training on customer data?
- Is there a lag between processing data and using it for training? (Can you delete it before it enters training?)
- Does the vendor have a data deletion process for previously trained models? (Almost no vendor offers model retraining to remove specific customer data, which is worth knowing.)
- Which subprocessors receive your data for inference, and what are their training policies?
- If the vendor changes their training data policy after contract signing, what is the notification period and can you exit the contract?
Question 5 is the Atlassian scenario. A vendor that can change training policy with 45 days notice and no exit right is a different risk profile from a vendor that requires material policy changes to go through contract amendment.
Related Reading
- Privacy-first AI APIs with no training on your data: 2026 guide
- AI data privacy for small teams: GDPR and CCPA compliance
- DeepSeek and Chinese AI models: GDPR data transfer risk
- GDPR AI fines 2026: enforcement cases and what small teams must know
- AI meeting transcription apps: data risks and compliance 2026
- Anthropic vs OpenAI: GDPR compliance differences 2026
- AI vendor due diligence checklist 2026
- Enterprise AI privacy pages: direct links to vendor documentation
- Russia's Project 2026 targets AI training data: 6-point vendor risk checklist
