Artificial intelligence on pirate books. This is how Metaai trained

2025-04-08 08:10
publication
2025-04-08 08:10
The greatest theft of works in the history of humanity, or how Meta trained its artificial intelligence on pirate books. Legal options were considered “irrationally expensive” and a “extremely slow” process.


It has been loud for artificial intelligence for years. Although it is only today that we have the opportunity to associate with her more and more often. A big problem for training artificial intelligence, however, turned out to be a huge number of high -quality texts.
Importantly, books are one of the most easily available sources of text. Unlike pages containing content optimized for search engines (SEO), they are a source of high quality. It is important because Ai feeding AI lowers its own qualityand yet a lot of content has already been generated by artificial intelligence. Therefore, books from before 2020 give a guarantee that they do not contain content created using it.
As “The Atlantic” revealed, Mark Zuckerberg, the general director of the finish, approved the use of Libgen (abbreviation of “Library Genesis”), i.e. an illegal file repository, which was the source of content used to train the AI Meta model. Libgen was created around 2008 in Russia, it is a system with over 7.5 million books and 81 million files with research. This is one of the largest online pirate libraries in the world, but useful for many students (and as it turns out not only them).
According to “The Atlantic”, the finish employees talked to many companies about buying a license for books to use them in work on AI, but the effects were unsatisfactory. “It seems irrationally expensive to me,” wrote one of the employees in the company's internal chat. The senior manager from the team working on Llam 3 added that it would also be a “extremely slow” process.
After the case came to light, original environments from around the world want to sue the finish. French publishers and authors announced such a plan in March 2025. Vincent Montagne, president of the National Association of Publishers, during a press conference, accused the finish line of “non -compliance with copyright and parasitism”. In Poland, the Literary Union (Association defending the laws of authors)
It also encourages the authors to check the database “The Atlantic”.
– We are dealing with the greatest theft of works in human history. Theft that governments allow. Even the European Union, despite the efforts of organizations of creators such as the European Writers Council, reacts below expectations. The strength of the so -called Big Techów, for now, is overwhelming. And yet piracy is the theft for which it should be punished – says Grażyna Plebanek, a writer involved in the actions of the literary Union, quoted by “Gazeta Wyborcza”.
Ed. Jm
