Germany: The EU AI Act and general-purpose AI
The EU AI Act is about to pass the finishing line. “General-purpose AI”, also called foundation models in debates, have been the subject of debate in the last months. Machine-learning based models such as GPT-4 are trained on broad data at scale and may form the basis for a range of downstream systems. This leads to a particular responsibility along the AI value chain. Accordingly, the AI Act regulates them with a higher burden.
General-purpose AI (“GPAI”) models have been on everyone’s radar as of late. Whether it is the hype around GPT-4, promising developments in healthcare and life sciences to improve diagnosis or new personalized medicine, or rather better online services tailored to the needs of a user, GPAI seems to be the future. Consequently, addressing GPAI has been probably the most controversial aspect of the new AI Act. It was so hotly debated that it almost led to a standstill in negotiations between EU member states. Big players such as Germany, and Italy opposed stricter regulation when the AI Act compromise was reached in December 2023, and France tried to block the regulation. Now, the EU has a final agreement that includes harmonized rules for GPAI models. Below we highlight the most important aspects of these rules.
What are GPAI models and how can they be classified?
GPAI models are “AI models that display significant generality, are capable to competently perform a wide range of distinct tasks and that can be integrated into a variety of downstream systems or applications”. Prominent examples of such GPAI models are GPT-4, DALL-E, Google BERT or Midjourney 5.1.
The AI Act provides a tiered risk classification for GPAI models. This vaguely reflects the risk categories for AI systems (see here) and differentiates between standard “GPAI models”, “openly and free licensed GPAI models” and “GPAI models with systemic risks”. The EU Commission is tasked with assessing whether a GPAI model is considered to pose systemic risks. This is the case, if a model reaches a certain technical threshold of computational resources or has foreseeable negative effects.
Who is in scope of GPAI related rules in the AI Act?
The AI Act addresses “providers” of GPAI. Hence, the focus is on companies that develop or “integrate” GPAI and place it on the EU market. Note that there remains a risk to be in scope of the AI Act’s obligations whenever you finetune pre-existing models such as GPT-4 by modifying or adapting data sources. Such finetuning could be considered an independent development.
What are the new obligations for GPAI in the AI Act?
In general, providers of GPAI models must create and provide detailed technical documentation of the model to the supervisory authority upon request, e.g. data used for training and validation, computational resources utilized and known or estimated energy consumption. Further, the provider must enable downstream users of their GPAI model to comprehend their capabilities and limitations and draw up and make publicly available a sufficiently detailed summary about the content used for training.
Providers of GPAI models posing systemic risks must additionally maintain an adequate cybersecurity protection, conduct model evaluations, including adversarial testing (so-called red teaming) and assess and mitigate possible risks. Incidents have to be documented and reported.
Providers of open-source licensed GPAI models, i.e. openly shared models free of access, without systemic risks do not have to meet the detailed documentation standard above.
How and when will these obligations be enforced?
The AI Act will likely be adopted in the next month and will generally apply 24 months later. However, the rules on GPAI models, governance and penalties shall already apply 12 months after the adoption. Enforcement and monitoring in respect of providers of GPAI models will be carried out by the “AI Office”, a new regulator created by the EU Commission. Also, a scientific panel will be formed that may notify to the AI Office. The EU Commission may impose fines up to 3 % of their total worldwide turnover in the preceding financial year or 15 million EUR, if providers of GPAI models intentionally or negligently infringe relevant provisions of the AI Act. As with other EU regulations in the digital space, it will be interesting to see how quickly a newly formed regulator is actually equipped to handle enforcement.
What other laws are relevant for GPAI?
Regulation of digital markets and services is high on the EU’s agenda. Many other laws interplay and have a relevance with regard to the development and deployment of GPAI. At the moment, in the EU the mostly discussed topics with regard to GPAI are privacy, copyright and cybersecurity. In particular, the legal basis under the GDPR for training and later usage within a trained model as well as the principles of transparency, data minimization and purpose limitation are flamboyant topics. These have already been addressed in detail by guidelines of some regulators (see e.g. UK, France, Bavaria, Baden-Württemberg and Hamburg).
Aside from that, training and generating output via GPAI may conflict with IP laws. Under EU Copyright laws, there is a “text and data mining” exception. This allows using copyright-protected information also for GPAI training purposes, unless rightsowners have opted-out. Many rightsowners are doing so, by providing applicable clauses and notifications in machine-readable format. This exception, however, is not the same in other relevant jurisdictions, such as the UK (see guidance by the ICO here), or completely comparable to the US (see information of the Copyright Office here). Outside the EU, there are also already cases such as the litigation of NY Times against OpenAI (New York Times Co v. Microsoft Corp et al) or the Getty case (Getty Images (US), Inc. v. Stability AI, Inc.) regarding training of machine-learning based models. Hence, it will be very interesting to see how things develop both within and outside the EU on this matter.
Finally, there are security requirements and risks to consider. Risks may range from attempts to deceive the model and induce incorrect outputs (so-called adversarial attacks) to misinformation such as GPAI “hallucinations” (i.e. seemingly plausible output that contradicts input data).