Updated 14 hours ago
xAI Trained Its Coding Models on Claude Outputs for Months Before Getting Cut Off

AI Training Ethics

xAI Trained Its Coding Models on Claude Outputs for Months Before Getting Cut Off

Elon Musk's xAI spent months distilling Anthropic's Claude to train its own coding models, continuing through personal accounts even after Anthropic revoked official access in January 2026. The revelation, reported by The Information, raises fresh questions about model distillation and the data supply chain behind AI coding tools.

The Distillation Pipeline: How xAI Used Claude to Train Grok

Elon Musk's xAI spent months training its coding models on outputs from Anthropic's Claude, using a technique called model distillation, The Decoder reported, citing.2 Distillation involves training a less capable model on the outputs of a stronger one — essentially teaching it to mimic the frontier model's behavior without needing the same training budget or data.

Anthropic revoked xAI's official access to Claude in January 2026, but xAI engineers kept going, routing requests through personal Claude accounts and an intermediary service called Blackbox AI, according to The Information. Musk previously admitted in court that xAI "partially" used OpenAI models to train Grok, calling it industry standard practice.

Anthropic's War on Distillation: A Broader Pattern

Anthropic has been fighting distillation attacks for months. In a post earlier this year, the company said it detected "industrial‑scale distillation attacks" — targeting Chinese labs DeepSeek, Moonshot AI, and MiniMax — which involved over 24,000 fraudulent accounts that generated more than 16 million exchanges with Claude, Anthropic disclosed in a:3

The xAI case is different. It's not a Chinese lab scraping through fraudulent accounts — it's a well‑funded American competitor with direct ties to the current administration. Musk's political proximity to President Trump adds a layer of complexity: xAI is simultaneously positioning itself as a national champion in AI while using a competitor's models as training data.

The Irony: xAI Is Now Renting Its Compute to Anthropic

While xAI was distilling Claude's outputs, Musk's broader compute empire took a different path. The GPUs Musk famously stockpiled are now being rented out to Anthropic via SpaceX's Colossus‑1 data center — a deal that provides 220,000 GPUs for Claude training, The Decoder reported separately. Google is also paying SpaceX $920 million a month for AI compute from the same infrastructure.

Internally, xAI's own model development appears troubled. The pretraining team shrank to fewer than five people. Four Grok code leads left within months, joining a wave of co‑founder departures tied to safety concerns and frustration over Grok's failure to close the gap with frontier models, The Information reported. One employee accidentally deleted critical training data, costing two to three weeks of work.

What This Means for AI Builders

Model distillation sits in a gray zone — legally, technically, and ethically. It's not theft in the traditional sense (the models aren't copied, they're trained on outputs), but it does transfer capability from the frontier lab that invested billions in training to a competitor that invested a fraction of that. Anthropic's terms of service prohibit using Claude to train competing models, giving them a contractual claim even where copyright law is unclear.

For builders choosing between AI coding tools — Claude Code, GitHub Copilot, Cursor, Codex, or Grok — the training data provenance matters. If Grok's coding capabilities were partly built on Claude's outputs, the tools are less differentiated than the branding suggests. It also raises questions about what happens when the source model (Claude) improves: does the distilled model (Grok) inherit those improvements, or does it get left behind?

The broader industry signal is clear. Frontier labs are treating their model outputs as proprietary training data worth protecting. The distillation wars — Anthropic vs. Chinese labs, and now Anthropic vs. xAI — suggest that the next front in AI competition won't just be about who has the most GPUs. It'll be about who controls the data supply chain.

The Legal Question: Is Distillation Theft or Fair Game?

The legal framework around model distillation is unsettled. In the US, training on publicly available data is generally permitted under fair use doctrines — the same principle that lets OpenAI and Anthropic train on web content. But distillation differs in two ways: the training data is the output of a specific, identifiable competitor's product, and that product's terms of service explicitly prohibit the practice.

Anthropic could pursue breach of contract claims against xAI. The company has already shown willingness to enforce its terms — it cut off DeepSeek, Moonshot AI, and MiniMax after detecting their distillation operations. Whether it takes the same action against Musk's company, given the political complications, remains an open question.

For the AI industry, the xAI revelation makes explicit what many have suspected: model distillation is widespread, and the line between learning from public outputs and copying a competitor's capabilities is blurry at best.

Sources

  1. 1.The Decoder(the-decoder.com)
  2. 2.The Information(theinformation.com)
  3. 3.Anthropic(anthropic.com)
  4. 4.The Decoder(the-decoder.com)

Share this article

PostShare

More on This Story

Related News