EU AI Act: obligations for companies fine-tuning LLMs
What the EU AI Act requires from companies fine-tuning LLMs on their own data: documentation, risk classification, obligations for high-risk systems.
The EU AI Act and LLM fine-tuning on proprietary data: what you must document
The EU AI Act (Regulation 2024/1689) entered into force in August 2024, with phased application running through 2027. For companies fine-tuning LLMs on their own data — a common use case among Romanian clients in legal, fiscal, medical and financial sectors — the Act introduces concrete obligations on documentation, risk classification, and transparency.
This article clarifies what the Regulation actually requires in practice from a company customising an open-weight or frontier model via fine-tuning. It is not legal advice; it is an operational synthesis for CTOs and DPOs.
TL;DR
- Fine-tuning your own model does not automatically turn you into a model „provider” under the AI Act, but it can if changes are substantive.
- AI systems classified „high-risk” (Annex III: HR, justice, critical infrastructure, education, etc.) carry extensive obligations: risk management system, data governance, transparency, human oversight, robustness, post-market monitoring.
- General-Purpose AI Models (GPAI) carry separate obligations: technical documentation, copyright policy, summary of training data. These apply to the upstream provider, but the downstream user must receive the documentation.
- Fine-tuning on sensitive data (legal, medical, financial) requires documented data governance: provenance, biases, licensing, retention.
- Maximum sanctions: 35M EUR or 7% of global turnover for prohibition breaches; 15M EUR or 3% for high-risk breaches.
Who the Regulation applies to
The EU AI Act applies to:
- Providers — the entity that develops an AI system or a GPAI model and places it on the EU market or puts it into service under its own name.
- Deployers — the entity that uses an AI system under its own authority (the former „user”).
- Importers and distributors — for AI systems placed on the EU market.
The critical question for fine-tuning is: when does a deployer become a provider? The AI Act answer: if the fine-tuning is „substantial” and changes the intended purpose of the model, the deployer becomes the provider for the derivative system. That triggers full provider obligations.
In practice:
- Light fine-tuning (instruction-tuning on a few hundred examples, no capability change) → you remain a deployer.
- Major fine-tuning (continued pretraining on a large corpus, capability changes, new purposes) → potentially a provider for the derivative system.
- LoRA / adapter on a frontier model for a specific use case → typically a deployer, but it depends on the degree of change.
Risk classification
The AI Act classifies AI systems into four categories:
- Prohibited (Article 5): social scoring, exploitation of vulnerabilities, biometric categorisation by race/religion, etc. Total ban.
- High-risk (Annex III): HR, justice, asylum, education, critical infrastructure, medical devices, etc.
- Limited risk: chatbots, deepfakes — minimum transparency obligations (user information).
- Minimal/no risk: everything else.
For a Romanian legal assistant fine-tuned on legislation, the classification depends on use:
- Internal assistant for lawyers (research tool): not high-risk per se.
- A system that automatically evaluates whether a case is worth pursuing in court: high-risk (access to justice).
- A system that automatically decides legal solutions without human oversight: high-risk + potentially prohibited (depending on implementation).
Obligations for high-risk systems
If your fine-tuned system falls under high-risk, Articles 9–17 require:
Risk management system (Art. 9): a continuous process of risk identification, evaluation, and mitigation. Documented. Updated at every significant change.
Data governance (Art. 10): for training/validation/test data:
- relevance, representativeness, freedom from errors, completeness
- bias examination processes
- gap and deficiency management
- processes for sensitive data (special categories under GDPR)
For fine-tuning, this means full documentation of the corpus: sources, licensing, anonymisation, dedup, identified biases.
Technical documentation (Art. 11 + Annex IV): a detailed document covering:
- general system description
- design and development elements
- data used (provenance, characteristics, labelling)
- capabilities and limitations
- monitoring and control
- human oversight detail
Record-keeping (Art. 12): automatic logging of relevant events throughout the lifecycle.
Transparency (Art. 13): clear information to the deployer regarding capabilities, limitations, performance metrics, instructions for use.
Human oversight (Art. 14): technical and organisational measures enabling effective human oversight.
Accuracy, robustness, cybersecurity (Art. 15): adequate accuracy + robustness against adversarial inputs + cybersecurity.
Post-market monitoring (Art. 72): post-deployment monitoring with incident reporting to authorities.
GPAI provider obligations
Articles 53–55 introduce a new category: General-Purpose AI Models. These have:
Standard obligations (Art. 53):
- technical documentation (Annex XI)
- documentation for downstream providers (Annex XII)
- copyright policy (compliance with rights under Directive 2019/790)
- public summary of training data
Additional obligations for systemic-risk GPAI (Art. 55):
- model evaluation + adversarial testing
- evaluation and mitigation of systemic risks
- serious incident reporting
- cybersecurity
The threshold for „systemic risk” is defined as >10^25 cumulative training FLOPs (in 2026 this catches frontier models).
If you are a deployer using an external GPAI model for fine-tuning, you have the right (and contractual obligation) to receive the downstream technical documentation under Annex XII.
Practical case: a Romanian legal assistant
Consider a typical case: a law firm fine-tunes an open-weight 14B model on a Romanian legal corpus with 30B tokens, for internal use by its lawyers.
Classification: probably NOT high-risk (internal research use), but watch for borderline cases (automated case decisions).
Minimum obligations:
- Transparency to users: clarification that this is generative AI, mandatory citation grounding.
- Copyright policy: documentation showing the corpus respects licensing and copyright.
- Internal risk assessment: not mandatory, but best practice.
- Data governance: documented corpus provenance, jurisprudence anonymisation, retention.
Additional obligations if you become a provider:
- Full technical documentation (Annex IV-style)
- Conformity assessment before market placement
- CE marking
- Registration in the EU database
Practical case: a GPAI model fine-tuned with LoRA
A fintech fine-tunes a frontier model with LoRA on 50K transactions for a fraud-detection assistant.
Classification: high-risk if the decisions directly influence access to financial services (Annex III, point 5(b) credit scoring).
Full high-risk obligations: as above, plus integration with sectoral obligations (PSD2, EBA, etc.).
In practice: a documented risk management system, data governance for the 50K transactions (anonymisation, biases, provenance), human oversight mechanism, continuous monitoring with incident reporting.
Application calendar
- 2 February 2025: Prohibitions (Art. 5) and AI literacy (Art. 4) apply.
- 2 August 2025: GPAI rules (Art. 51–56) and governance.
- 2 August 2026: most obligations (high-risk, transparency).
- 2 August 2027: Annex I high-risk (e.g., medical devices).
For companies planning fine-tuning in 2026, the preparation window for the main obligations is now.
Minimum documentation checklist
For any fine-tuning project in 2026, we recommend you have documented:
- Model card: base model, modifications applied, hyperparameters, hardware, evaluation.
- Data card: corpus sources, licensing, anonymisation, dedup, identified biases, retention policy.
- Risk assessment: use cases, identified risks, mitigation measures.
- Use policy: who may use, for what purposes, what is forbidden, incident reporting mechanism.
- Audit log: deployments, evaluations, incidents.
This documentation is not paperwork. It is the first line of defence in an audit or incident.
Common traps
„We are only a deployer.” Check the degree of change. Continued pretraining on 30B tokens probably makes you a provider for the derivative system.
„The base model is GPAI, that covers us.” GPAI provider documentation covers the base model, not your fine-tuned system.
„The data is anonymous.” Anonymisation must meet EU standards (GDPR-style). Pseudonymisation is not enough in many cases.
„We only serve non-EU clients.” If the seller, deployer, or output reaches the EU market, the AI Act applies.
Decision diagram
Fine-tuning planned?
├── Substantial change (large CPT, new capabilities)?
│ ├── Yes → You are a provider; full obligations
│ └── No → Probably a deployer; deployer obligations
│
├── System falls under Annex III (high-risk)?
│ ├── Yes → Art. 9–17 obligations + post-market monitoring
│ └── No → Transparency obligations + best practices
│
└── Using an external GPAI?
└── Request Annex XII documentation from the provider
Operational conclusion
The EU AI Act is not a blocker for fine-tuning. It is a set of documentation and governance requirements that, when applied from day one, become a natural part of the ML pipeline. Companies that begin with documentation discipline avoid expensive remediation in 2026–2027.
For CAI Technology clients in regulated verticals (legal, medical, financial), we offer combined technical + compliance consulting to prepare an AI Act-ready dossier before deployment.
Related articles
- Pillar Leta — Romanian legal assistant
- Anti-hallucination for legal chatbots: 2.8M Romanian documents
- Fine-tuning LLMs on Romanian corpora
External sources
- EU Regulation 2024/1689 (EU AI Act) — consolidated text
- European Commission, „AI Act Q&A”
- ENISA, „Multilayer Framework for Good Cybersecurity Practices for AI”
- NIST, AI Risk Management Framework 1.0
Next step
For an analysis of AI Act obligations applicable to your fine-tuning project, we offer a 30-minute session with our DPO and ML engineer.