A few weeks ago, the Washington Post spotlighted a striking image from the “Panama Project,” depicting a warehouse filled with hundreds of thousands of books awaiting their fate of being scanned and subsequently destroyed. This sobering reality is part of Anthropic’s internal program dedicated to training its artificial intelligence, a costly endeavor involving millions spent on digitizing these works without the explicit consent of their authors.
This isn’t an isolated case; numerous AI firms have also “borrowed” copyrighted content for their training purposes. The European Union has made a clear declaration: it’s high time these entities stop appropriating protected content without proper licensing.
Europe’s Standpoint: Pay the Authors
The juxtaposition of the past and present is telling. In the early 2000s, the entertainment industry, in tandem with regulatory bodies, advanced campaigns against piracy, vehemently labeling any form of media copying as akin to theft—“You wouldn’t steal a car; don’t steal a movie.” Yet, decades later, a deafening silence echoes from the same industry as big tech firms engage in similar practices to train their AI models, as reported by the Washington Post. Giants like Meta, Google, and OpenAI have also been caught in the race for extensive data acquisition.
Instances abound: Meta alone has amassed an astonishing 81.7 TB of copyrighted books, while OpenAI has faced backlash for incorporating animations from various studios in its training data. This situation has prompted the European Parliament to take legislative action, approaching this issue with the seriousness it deserves.
Legislative Measures Ahead
Responding to these challenges, the European Parliament is advocating for a measures package, encapsulated in a non-binding report that urges the European Commission to devise regulations ensuring that AI companies adhere to minimum standards regarding copyrighted content. Their directive is simple: if AI firms utilize protected works for training, they must secure licenses and fairly compensate the original creators.
“Generative AI should not operate outside the rule of law.”
This directive outlines several key demands:
- Require transparent and remunerated use of protected content for training AI.
- Obligate AI vendors to recognize and compensate for the copyrighted works utilized.
- Implement measures allowing rights holders to exclude their works from AI training.
As emphasized by members of the European Parliament (MEPs), creators deserve transparency, legal certainty, and fair compensation when their works feed development processes.
AI’s Pushback
However, AI companies are not backing down easily. The Computer and Communications Industry Association (CCIA) has criticized this initiative, dubbing it a “compliance tax” that hinders progress. They argue the measure unfairly burdens smaller companies, complicating negotiations with publishers and stifling Europe’s competitive edge in the global digital landscape.

Despite these objections, the legislative initiative represents a crucial step in addressing the unregulated use of copyrighted materials in AI training. While the report is currently non-binding, it signals the European Parliament’s intentions for future regulations, underscoring the urgency of addressing copyright in an era where AI continues to evolve rapidly.
As we navigate this complex terrain, the struggle between innovation and the rights of creators becomes more apparent, highlighting the need for coherent legislation in an ever-changing digital landscape.

