Machine Learning

Understanding the EU AI Act's Transparency Requirements for General-Purpose AI

EU AI Act mandates AI data transparency balancing innovation and accountability

cosm logo
eu ai act data transparency


In an era where artificial intelligence (AI) is rapidly evolving and increasingly influencing our daily lives, the European Union has taken a significant step towards regulating this transformative technology. The EU AI Act (AIA), signed into law on June 13, 2024 [1], introduces a new paradigm for AI governance, with a particular focus on transparency in the development of general-purpose AI (GPAI) models. This paper aims to demystify the Act's transparency requirements and explore their implications for AI developers, researchers, and society at large.

The EU AI Act: A Brief Overview

The EU AI Act is a comprehensive legislative framework designed to ensure the safe and ethical development and deployment of AI systems within the European Union. One of its key provisions, outlined in Article 53(1)d, mandates that providers of general-purpose AI models must "draw up and make publicly available a sufficiently detailed summary of the content used for training of the general-purpose AI model." [2]

This requirement is a response to growing concerns about the opacity of AI systems, particularly large language models and other general-purpose AI technologies that have wide-ranging applications across various sectors.

The Importance of Training Data Transparency

The training data used in AI models is crucial in determining their behavior, capabilities, and potential biases. By requiring transparency in this area, the EU AI Act aims to address several key issues:

  1. Copyright Protection: Many AI models are trained on copyrighted materials. Transparency allows copyright holders to verify if their works have been used and to exercise their rights accordingly.
  2. Privacy and Data Protection: Personal data may be included in training datasets. Transparency enables individuals to understand if and how their data might have been used.
  3. Scientific Scrutiny: Researchers need access to information about training data to evaluate AI systems, identify potential biases, and validate scientific claims.
  4. Non-Discrimination: Transparency in training data can help identify and mitigate potential biases that could lead to discriminatory outcomes.
  5. Fair Competition: By reducing information asymmetry, transparency can help level the playing field in the AI industry.

What Information Must Be Disclosed?

The proposal suggests a comprehensive template for the "sufficiently detailed summary" required by the AI Act. Key elements include:

  1. General Information:
    • Total size of the training data
    • Details on any ethical review processes conducted
  2. Data Sources and Datasets:
    • Information on data collection methods (e.g., web scraping, public repositories, proprietary databases)
    • Date ranges of the training data
    • Legal basis for data collection and processing
    • Information on data anonymization techniques
    • List of datasets used, including their relative proportions in the training data
  3. Data Diversity:
    • Proportions of data across relevant categories (e.g., languages, regions)
    • Steps taken to ensure diversity and representativeness
  4. Data Processing:
    • Methodology for data annotation and labeling
    • Preprocessing steps applied to the data
    • Information on data sampling methods

Implications for AI Developers

For AI developers, particularly those working on general-purpose AI models, this transparency requirement presents both challenges and opportunities:

  1. Documentation Practices: Developers will need to implement robust documentation practices throughout the data collection and model training process.
  2. Ethical Considerations: The requirement for ethical reviews may necessitate the integration of ethical considerations earlier in the development process.
  3. Data Management: More stringent data management practices will be required to track the sources, characteristics, and processing of training data.
  4. Competitive Considerations: While increased transparency may reveal some aspects of a company's AI development process, it also creates opportunities for collaboration and improvement across the industry.

Open Future has published a blueprint template for the summary of content used to train general-purpose AI models here -

Benefits for Society

The transparency requirements of the EU AI Act offer several potential benefits for society:

  1. Increased Accountability: By making information about AI training data public, the Act enables greater scrutiny and accountability in AI development.
  2. Enhanced Trust: Transparency can help build public trust in AI systems by demystifying their development process.
  3. Improved AI Systems: With more information available, researchers and developers can work towards addressing biases and improving the quality of AI systems.
  4. Protection of Rights: Individuals and organizations can better protect their rights related to copyright, privacy, and non-discrimination.

Challenges and Considerations

While the transparency requirements offer many benefits, there are also challenges to consider:

  1. Trade Secrets: Companies may be concerned about revealing proprietary information through these disclosures.
  2. Complexity: The sheer volume and complexity of data used in modern AI systems may make comprehensive documentation challenging.
  3. Interpretation: There may be difficulties in interpreting the disclosed information, particularly for non-experts.
  4. Implementation: Companies will need to develop new processes and potentially tools to comply with these requirements.


The EU AI Act's transparency requirements for general-purpose AI represent a significant step towards more accountable and trustworthy AI systems. By mandating detailed disclosures about training data, the Act aims to balance the innovative potential of AI with the need for oversight and protection of individual rights.

For AI developers, while these requirements may initially seem daunting, they also present an opportunity to build more robust, ethical, and trustworthy AI systems. For society at large, these measures promise to shed light on the previously opaque world of AI development, potentially leading to better, fairer, and more beneficial AI technologies.
As we move forward into this new era of AI governance, it will be crucial for all stakeholders - developers, researchers, policymakers, and the public - to engage in ongoing dialogue and collaboration to ensure that the promise of these transparency measures is fully realized.


[1] - The AIA will enter into force 20 days after publication in the Official Journal of the European Union which is expected to happen some time in July. The GPAI rules will take effect within 12 months thereafter. As a result, providers of GPAI models will be required to publish data summaries starting in mid-2025

[2] -

Image Source: Created with assistance from ChatGPT, powered by OpenAI

Disclaimer -