And Zuckerberg personally approved the piracy, according to these documents.
Free Loaders
Newly unredacted court documents allege that Meta, formerly Facebook, knowingly used pirated books obtained from the online archive Library Genesis to train its AI models, Wired reports.
Submitted in an ongoing lawsuit filed against the platform by a group of authors including Ta-Nehisi Coates and comedian Sarah Silverman, the documents were finally released in full after a judge shot down Meta's attempts to keep portions of them sealed.
The judge argued, per Wired, that Meta fought for the redactions merely to "avoid negative publicity," citing a damning internal quote from one of its employees.
"If there is media coverage suggesting we have used a dataset we know to be pirated, such as LibGen, this may undermine our negotiating position with regulators on these issues," the employee wrote.
Lie-brary
Library Genesis, or LibGen, is a "shadow library" that provides free access to millions of books, academic articles, and magazines.
That a multibillion-dollar corporation like Meta would tap into its store of pirated content is the latest sign of the impunity that tech companies have operated with to train their large language models, vacuuming up copyrighted content en masse with seemingly no regard for the law — or even the decency, as one of the world's most valuable companies, to buy a single copy of each volume it was using to power its AI.
Meta and other AI leaders argue that using books and other data scraped from the web constitutes "fair use," but it will ultimately be up to legal battles like this one to determine if that is the case.
Fair use or not, some of the exchanges exposed in the newly unredacted documents suggest that Meta employees knew that what they were doing was legally and ethically dicey.
With a grinning emoji, one engineer wrote: "Torrenting from a [Meta-owned] corporate laptop doesn't feel right," as quoted by Wired.
And it goes all the way to the top. A cited memo allegedly shows that after employee discussions about using LibGen were escalated to "MZ" — Meta CEO Mark Zuckerberg — the AI team was "approved to use" material from the database.
High Seas
The plaintiffs argue that this shatters any plausibility that Meta may try to maintain.
"Meta has treated the so-called 'public availability' of shadow datasets as a get-out-of-jail-free card, notwithstanding that internal Meta records show every relevant decision-maker at Meta, up to and including its CEO, Mark Zuckerberg, knew LibGen was 'a dataset we know to be pirated,'" they wrote in the most recent motion.
Moreover, the authors point to testimony from a Meta corporative representative as an admission that the company also helped disseminate the pirated books by "seeding" their corresponding torrents, or uploading portions of the material so that other users could download them.
More on AI: Facebook Caught Hosting AI-Powered Hitler
Share This Article