Question 1

What is Training Data?

Accepted Answer

The dataset used to train an AI model. The quality, composition, and provenance of training data fundamentally shape what a model can do and how it behaves. Training data issues — including biases in the dataset, inclusion of copyrighted material, or presence of personal data — can create downstream legal and ethical liability for AI developers. The EU AI Act requires GPAI model providers to document their training datasets, including sources and any copyright opt-out processes used.

Question 2

Why does Training Data matter for small teams?

Accepted Answer

If you don't know what your AI vendor's model was trained on, you don't know its biases, limitations, or legal exposure. Make 'what training data was used?' a standard question in vendor due diligence — reputable vendors will answer it.

Training Data

Defined in law

Related terms

Further reading

Training Data

Defined in law

Related terms

Further reading