California AB 2013 Compliance Checklist 2026 — AI Training Data Transparency
California AB 2013 (the Generative AI Training Data Transparency Act) took effect January 1, 2026. If your company offers a generative AI system to California users and has not published a training data disclosure page, you are already out of compliance.
At a glance:
| Element | Detail |
|---|---|
| Effective date | January 1, 2026 |
| Who it covers | Developers offering GenAI systems to Californians |
| Core obligation | Publish training data disclosure on website |
| Update trigger | Substantial modification to the AI system |
| Enforcement | California UCL — AG, local AGs, and private lawsuits |
| Applies to systems released from | January 1, 2022 onward |
| Format required | None prescribed — must be publicly accessible |
Step 1: Determine if you are in scope
AB 2013 covers developers of generative AI systems made available to Californians. You are in scope if you:
- Offer a text, image, audio, video, or code generation model to the public
- Make a generative AI API available to third-party developers
- Operate a product built on a fine-tuned or custom-trained generative model
- Released or substantially modified a GenAI system on or after January 1, 2022
Out of scope:
- AI systems used solely for internal business operations with no consumer-facing output
- AI used for national security or defense applications
- AI systems for safety and security purposes (fraud detection, spam filtering) that do not generate open-ended content
- Pure wrappers around third-party APIs where you have not conducted any training or fine-tuning (though best practice is to disclose what you know about the underlying model)
Gray zone: If you fine-tune an existing model on your own data and deploy it to users, you are likely in scope for the fine-tuning dataset, even if the base model disclosure is handled by the upstream provider.
Step 2: What must be published
The disclosure must appear on your publicly accessible website before the system is made available. No prescribed format: a dedicated webpage, a model card, or a PDF linked from your product page all work.
Required fields:
| Field | What to include |
|---|---|
| Dataset names and sources | Name each dataset (or category of dataset); identify whether it came from web scrape, licensed data purchase, first-party data, or other source |
| Data types | Specify the modalities: text, images, audio, video, code, structured data, synthetic data |
| Copyrighted material | State whether copyrighted content is included; if yes, describe the licensing basis (licensed, fair use claim, rights-reserved opt-out honored) |
| Personal information | State whether personal data was included; if yes, describe the category (public biographical data, user-consented data, etc.) |
| Synthetic data | Disclose whether any training data was synthetically generated and how it was produced |
| Data processing | Describe filtering, deduplication, quality scoring, or other processing applied |
| Intended purpose of datasets | Describe what capability each dataset was intended to develop |
What you do NOT have to disclose:
- Proprietary details of your training pipeline
- Full dataset contents or samples
- Specific file counts or token counts (though including these helps)
- Vendor contracts or pricing
Step 3: Where to publish it
The law requires the disclosure to be on your "internet website." Accepted formats:
- Dedicated
/training-dataor/model-transparencypage on your product website - Model card hosted on Hugging Face, GitHub, or your own documentation site with a link from your main site
- Section within your existing Terms of Service or Privacy Policy (not recommended — harder to find and update)
- Structured data card following a published schema (e.g., Hugging Face dataset card format, Croissant format)
The disclosure should be findable from your product's main page within 1-2 clicks. Burying it in a footer link to a legal page satisfies the letter but creates litigation risk.
Step 4: Update triggers
You must update the disclosure when there is a substantial modification to the AI system. Treat these events as triggers:
| Event | Triggers update? |
|---|---|
| New base model version (e.g., switching from GPT-4 to GPT-4o) | Yes — if you conducted any fine-tuning |
| Fine-tuning on a new proprietary dataset | Yes |
| Adding a new data source to the training pipeline | Yes |
| Prompt engineering or system prompt changes only | No |
| Inference parameter changes (temperature, context window) | No |
| RAG pipeline updates (new knowledge base) | Likely yes — disclose the new retrieval corpus |
| Bug fixes or safety filter updates | No |
Build the update into your release process: every model release checklist should include a disclosure review step.
Step 5: Enforcement and litigation risk
AB 2013 is enforced via the California Unfair Competition Law (UCL), which makes non-compliance an unlawful business practice. Unlike many state AI laws with AG-only enforcement:
| Enforcement actor | What they can do |
|---|---|
| California Attorney General | Civil action, injunction, civil penalties |
| Local district attorneys and city attorneys | Same powers as AG within their jurisdiction |
| Private plaintiffs (representative action) | Sue in the public interest under UCL § 17204 |
The private lawsuit risk is real. Since Prop 64 (2004), UCL plaintiffs must demonstrate actual injury in fact and loss of money or property to claim restitution. But UCL § 17204 representative actions for injunctive relief remain accessible with a lower standing bar — a public interest plaintiff can seek to compel compliance without proving individual damages. Attorney-fee shifting further incentivizes plaintiff firms targeting non-compliant AI developers.
What typically triggers a lawsuit: public disclosure failures that are easy to document. A product launched without any disclosure page, or a model updated on a new training dataset with no corresponding disclosure update.
Step 6: Interaction with other laws
AB 2013 operates alongside other data transparency obligations:
| Law | Training data relevance |
|---|---|
| AB 2013 (California) | Disclosure of training datasets, publicly on website |
| EU AI Act GPAI (Chapter V) | Training data summary, published using EU AI Office template, for EU market |
| CCPA / CPRA | Training on California residents' personal data may trigger notice and opt-out obligations |
| Copyright law | Training on rights-reserved content is subject to ongoing litigation — disclosure of copyrighted training data does not grant a license |
If you are targeting both California and EU markets, the GPAI training data summary requirement and AB 2013 are structurally similar — you can produce one document that satisfies both with minor additions for each.
Minimum viable disclosure page — template structure
# [Product Name] — AI Model Training Data Disclosure
Last updated: [date]
This page describes the data used to train [product name]'s AI model,
as required by California AB 2013.
## Training Datasets
| Dataset | Source | Data type | Copyrighted content | Personal data | Synthetic data |
|---|---|---|---|---|---|
| Common Crawl (filtered) | Web scrape | Text | Yes — fair use basis | No | No |
| [Dataset 2] | ... | ... | ... | ... | ... |
## Data Processing
[Describe: deduplication approach, quality filtering, safety filtering,
PII removal if applicable]
## Intended Purpose
[Describe what each dataset was used to develop — e.g., "general
language understanding", "code generation", "instruction following"]
## Updates
This disclosure was last updated on [date] following [description of
substantial modification]. Prior versions are available at [link].
## Contact
Questions about this disclosure: [email]
Practical next steps for small teams
- Audit your GenAI systems today — list every model you train, fine-tune, or substantially configure that reaches California users
- Create the disclosure page — one page per system; link from the product homepage and footer
- Add disclosure review to your release checklist — before every model update, check whether it constitutes a substantial modification
- Document your training data internally — you cannot disclose what you haven't recorded; build data provenance tracking from the start
- Check the GPAI overlap — if you are also an EU market AI provider, a combined AB 2013/GPAI disclosure saves time
Related reading
- Texas TRAIGA compliance checklist — another in-force state AI law with documentation requirements
- GPAI enforcement August 2026 — EU parallel to AB 2013's training data summary obligation
- Federal AI preemption vs state laws 2026 — why AB 2013 applies even if you expect federal preemption
References
- California Legislature — AB 2013 Full Text
- Crowell & Moring — California's AB 2013 Requires Generative AI Data Disclosure by January 1, 2026
- Goodwin Law — California's AB 2013: Generative AI Developers Must Show Their Data
- TrustArc — California SB 942 & AB 2013: AI transparency compliance guide
