California is home to the world's fifth-largest economy and a significant fraction of US tech consumers. If you sell an AI product in California and train your own models, California AB 2013 has been in effect since January 1, 2026.
The law is not complicated. It requires a training data disclosure page on your website. Most developers who train models can comply in a few hours.
TL;DR: AB 2013 requires developers of generative AI products available to California residents to post a training data disclosure on their website. Required: data source categories, whether personal data was used, approximate volume, synthetic data disclosure, and date range. Update when training data materially changes. No private right of action; 30-day cure period before AG can impose penalties.
Who AB 2013 applies to
The law applies to a company if:
- You are a developer of a generative AI system (you train or fine-tune a model, not just use someone else's API)
- The system is available to California residents (via API, app, website, or any other distribution channel)
"Generative AI" under AB 2013 means AI systems that can generate synthetic content including text, images, audio, video, or code. This includes large language models, image generation models, audio synthesis models, and multimodal models.
The law does not apply to companies that:
- Only use commercial AI APIs (OpenAI, Anthropic, Google) without training their own models
- Use pre-trained models without any fine-tuning
- Develop AI systems that are not generative (pure classification, regression, or prediction models)
If you fine-tune a commercial model on your own data, you are a developer for the purposes of AB 2013. Fine-tuning counts as training.
What to disclose
AB 2013 requires five categories of information in your training data disclosure:
1. Data sources (categories)
List the categories of data used to train your model. You do not need to name specific datasets. Categories include:
- Web-scraped data (publicly available web content)
- Licensed third-party data (data purchased or licensed from data providers)
- User-generated content (content created by users of your product or predecessor products)
- Synthetic data (AI-generated training data)
- Proprietary internal data (data your company collected or created)
- Public domain data (government publications, openly licensed academic datasets)
If your model was trained in stages (pretraining on broad data, then fine-tuning on domain-specific data), disclose the categories for each stage separately.
2. Whether personal data was used
Disclose whether any personal information was included in your training data. "Personal information" under AB 2013 aligns with the CCPA definition: information that identifies, relates to, or could reasonably be linked to a California consumer.
If personal data was used: describe the categories of personal data (names, email addresses, browsing history, etc.) and whether that data was obtained with consent or from public sources.
3. Approximate volume
Disclose the approximate volume of training data. Exact numbers are not required. Acceptable formats: "approximately 1 trillion tokens," "several hundred billion tokens," "approximately 100,000 image-caption pairs."
4. Whether synthetic data was used
If any portion of your training data was AI-generated (synthetic), disclose this. Note the approximate proportion if significant.
5. Date range of data collection
Disclose the approximate period over which training data was collected. Example: "Web-scraped data collected between January 2020 and December 2024."
Sample training data disclosure page
Copy this template to your website (typically at /ai-transparency or as a section of your privacy policy):
AI Training Data Disclosure
Last updated: [DATE]
This disclosure is provided pursuant to California AB 2013 (Artificial Intelligence Training Data Transparency Act).
About our AI systems
[Company Name] develops and operates [briefly describe the AI product, e.g., "a generative AI writing assistant" or "an AI image generation platform"].
Training data sources
Our AI systems were trained on data from the following categories of sources:
- Web-scraped data: Publicly available text content from websites, collected using automated crawling. We respect
robots.txtopt-out signals and honor publisher opt-out requests. - Licensed data: [If applicable] Data licensed from third-party data providers under commercial agreements.
- Synthetic data: [If applicable] AI-generated data created to supplement training data in specific domains.
- [Add additional categories relevant to your training data]
Personal information
[Choose one of the following:]
Option A (no personal data): Our training data does not include personal information as defined by the California Consumer Privacy Act.
Option B (personal data used): Our training data includes publicly available personal information, including [list categories: e.g., names and professional information scraped from public web pages]. This data was collected from publicly accessible sources. We do not use this data to identify specific individuals.
Data volume
Our training dataset contains approximately [describe volume, e.g., "X billion tokens of text data" or "X million image-caption pairs"].
Synthetic data
[If applicable] Approximately [X]% of our training data is synthetically generated.
Data collection period
Our training data was collected from approximately [start date] to [end date]. [Note if ongoing or refreshed.]
Updates to this disclosure
We will update this disclosure when we make material changes to our training data. Changes in model versions, significant new data sources, or changes in our use of personal information will trigger an update.
Contact
For questions about our AI training data practices, contact: [EMAIL OR CONTACT FORM LINK]
Interaction with other transparency requirements
EU AI Act GPAI Code of Practice: The GPAI Code of Practice (finalized June 2026) requires a similar training data summary from GPAI providers. If you already comply with the GPAI transparency template, your AB 2013 disclosure covers most of the same ground. Align your disclosures rather than maintaining separate documents.
FTC guidance: The FTC has guidance on AI disclosure that includes training data transparency as a component of honest dealing. AB 2013 compliance also supports FTC compliance.
AI model cards: Several AI governance frameworks (Google, Hugging Face) recommend model cards that include training data documentation. If you already publish model cards, extend them to cover the AB 2013 categories.
What compliance looks like in practice
For a startup with a fine-tuned LLM: Your base model's training data disclosure is the responsibility of the base model provider (OpenAI, Anthropic, etc.). Your disclosure covers the fine-tuning data you added. If you fine-tuned on your own customer data, disclose the categories of that data.
For a company with a proprietary pre-trained model: Full disclosure required across all training stages.
For a company using RAG (retrieval-augmented generation) without fine-tuning: If you are only using a commercial API with RAG and no fine-tuning, you are not a developer under AB 2013. The disclosure requirement does not apply to you.
For a company that fine-tuned a model for a specific customer (custom enterprise model): If the fine-tuned model is only available to that customer and not offered broadly, check whether that customer's employees are California residents. If yes, the disclosure requirement likely applies.
Compliance checklist
Run through these steps to achieve AB 2013 compliance:
- Determine whether your company trains or fine-tunes any generative AI models
- Determine whether any such models are available to California residents (directly or through any distribution channel)
- If both are true: inventory your training data by the five disclosure categories (sources, personal data, volume, synthetic data, date range)
- Draft your training data disclosure using the template above
- Publish it at a publicly accessible URL on your website (a dedicated page is cleaner than embedding in a privacy policy)
- Link to the disclosure from your product's main page or footer
- Set a review calendar reminder for when you expect to materially update your training data
- Update the disclosure when you make material training data changes; update the "Last updated" date
Keeping the disclosure current
AB 2013 requires updating the disclosure when training data materially changes. What counts as material:
- Adding a new category of data source (for example, adding licensed data when you previously used only web-scraped data)
- Starting to use personal information in training data after previously disclosing no personal information
- A significant volume change (for example, adding a large proprietary dataset that substantially changes the mix)
- Starting to use synthetic data
Version releases of a model that use the same underlying training data distribution do not require a disclosure update. The disclosure tracks training data, not model versions.
A practical approach: treat training data disclosure updates the same as privacy policy updates. When you make a material change, post the update, note the date, and (if your users have accounts) consider notifying them.
AB 2013 vs. CCPA and GDPR
AB 2013 is a transparency law, not a privacy law. It does not create rights for individuals to access, correct, or delete their data from training sets. Those rights come from CCPA (for California residents) and GDPR (for EU residents).
The overlap: if your training data included personal information of California residents, you may have CCPA obligations around that data collection in addition to the AB 2013 disclosure requirement.
If your training data included personal information of EU residents, GDPR requirements apply (lawful basis for processing, data minimization, records of processing activities). The EU AI Act GPAI Code of Practice also requires a copyright compliance process for EU-facing models.
Complying with AB 2013 disclosure is simpler than CCPA or GDPR data rights compliance, but it does not substitute for those obligations.
