AI5 min readArs Technica AI

Authors' lucky break in court may help class action over Meta torrenting

P
Redakcja Pixelift0 views
Share
Authors' lucky break in court may help class action over Meta torrenting

Rob Wilkinson | iStock / Getty Images Plus

Eighty terabytes of pirated content used to train AI models have placed the tech giant Meta in a difficult legal position, from which a recent Supreme Court ruling is intended to provide a way out. Mark Zuckerberg's company is attempting to dismiss charges of contributory copyright infringement, arguing that the mere use of the BitTorrent protocol does not make the company responsible for the distribution of illegal files. A key turning point is the judge's decision allowing authors seeking a class-action lawsuit in the case of *Kadrey v. Meta* to add claims regarding the facilitation of piracy. This is significantly easier to prove than direct copyright infringement, which would require demonstrating that Meta shared specific works in their entirety. Meta is basing its defense on precedents involving Internet Service Providers (ISPs), claiming that it cannot be held liable for simply providing technology that can be used for infringement, as long as the company did not actively encourage it. For creators and users of creative tools, the outcome of this dispute will be of fundamental importance: it will determine whether AI giants can use "distributed" data sources, such as torrents, with impunity by hiding behind the technical specifics of P2P networks. If the courts accept the authors' arguments, tech companies will have to radically change how they acquire training sets, which could force a transition to a model of full data licensing.

The BitTorrent Protocol as a Legal Trap

The main point of contention is the fact that Meta, while building its language models, used massive datasets (estimated at up to **80 terabytes**) that were obtained via the BitTorrent protocol. Previously, authors tried to prove the so-called "distribution claim," which refers to the direct distribution of protected works. The problem lies in the fact that the BitTorrent network splits files into fragments distributed among thousands of users (the so-called "swarm"). Proving that Meta downloaded and shared a specific book in its entirety is technically and procedurally extremely complicated.

However, District Judge **Vince Chhabria** allowed the charge of **contributory infringement** (aiding and abetting copyright infringement) to proceed. This is a fundamental shift because this legal standard is much easier to demonstrate. Instead of proving the transmission of entire files, accusers only need to show that Meta—by using torrents—facilitated copyright infringement by third parties. The mechanics of BitTorrent are ruthless here: to download data, a user must simultaneously upload (seed) it. Thus, by downloading training data, Meta became an active link in a piracy network.

Courtroom and symbolism of justice in a technological context
The battle over AI training data is entering a phase where the technical aspects of file transfers determine millions in damages.

Defense Strategy Based on a Supreme Court Ruling

Meta, however, does not intend to back down easily and is building its line of defense around a recent Supreme Court ruling in the **Cox** case. That verdict concerned Internet Service Providers (ISPs) and limited their liability for piracy committed by users of their networks. Meta argues that secondary liability standards should be interpreted narrowly. According to the company, the plaintiff (in this case, also the company **Entrepreneur Media** in a parallel lawsuit) did not identify specific individuals to whom Meta allegedly shared data, nor did it prove that the corporation had "actual knowledge of specific acts of infringement."

Meta's lawyers claim that the mere use of torrent technology does not automatically imply an intent to support piracy. In its legal filings, the giant goes even a step further, suggesting that it cannot be proven that Meta employees even understood that downloading files via BitTorrent requires their simultaneous uploading. This is a line of defense causing significant controversy in the tech community, considering we are talking about one of the most advanced engineering companies in the world.

Judicial Reprimand and the Authors' "Stroke of Luck"

Despite the decision to expand the charges being favorable to the authors, Judge Chhabria did not hide his irritation with the behavior of their legal representatives, specifically the firm **Boies Schiller**. The judge described their arguments as a "lame excuse" and accused them of trying to cover up their own procedural errors by constantly "bashing Meta" instead of focusing on the merits. Chhabria noted that the lawyers could have added the contributory infringement charge as early as November 2024, and called their current explanations about delays in Meta providing evidence "doubletalk."

Modern office and server room in the context of Big Tech
For Meta, the stakes are not just damages, but the legality of the data acquisition processes on which their AI models are based.

So why did the judge grant the motion if he considered it late? The authors simply got lucky. Meta itself had previously requested to synchronize the case schedule with the lawsuit filed by Entrepreneur Media. Since the **contributory infringement** charge already existed in that proceeding, the judge decided that Meta would have to face it anyway, so adding it to the authors' class action lawsuit would not cause additional delay or procedural injustice. At the same time, Chhabria emphasized that rejecting the motion would harm the interests of thousands of authors who might lose their chance for justice due to the negligence of their representatives.

Risk Analysis for the AI Sector

This situation sheds light on a broader problem in the artificial intelligence industry: the origin of data. If courts find that using torrent networks to build datasets such as **The Pile** or subsets of **Llama** automatically entails liability for contributory piracy, the business model based on free "scraping" of the internet could be ruined.

  • Lower evidentiary threshold: Demonstrating the facilitation of piracy is much simpler than proving the distribution of specific works.
  • Discovery Risk: If the case moves to the next stage, Meta may be forced to reveal internal communications, which would show whether engineers knowingly ignored the legal aspects of torrenting.
  • Cox Precedent: The interpretation of the Supreme Court ruling will be key—whether Meta will be treated as a "neutral service provider" or as an active entity "inducing infringement."

From an industry perspective, Meta is on the defensive. Although Judge Chhabria stayed the discovery process until the motion to dismiss is resolved, allowing the contributory infringement charge drastically increases the authors' chances of obtaining a favorable verdict or a high settlement. If Meta fails to convince the court that the Cox ruling provides a protective shield, we may witness one of the most important verdicts defining the boundaries of "ethical" data acquisition for AI. The game is no longer just about what models can do, but how they learned to do it. Meta, by claiming it did not know how torrents work, is staking its technological credibility in exchange for procedural survival. It is a risky gamble, the outcome of which will shape the creative technology market for years to come.

Comments

Loading...