Loading…
Loading…
The dataset used to train an AI model. The quality, composition, and provenance of training data fundamentally shape what a model can do and how it behaves. Training data issues — including biases in the dataset, inclusion of copyrighted material, or presence of personal data — can create downstream legal and ethical liability for AI developers. The EU AI Act requires GPAI model providers to document their training datasets, including sources and any copyright opt-out processes used.
Why this matters for your team
If you don't know what your AI vendor's model was trained on, you don't know its biases, limitations, or legal exposure. Make 'what training data was used?' a standard question in vendor due diligence — reputable vendors will answer it.
A company discovers its AI hiring tool was trained partly on historical hiring data that underrepresented women in technical roles. The training data bias translates directly into discriminatory recommendations.