Sunday, January 11, 2026

Trending

Related Posts

Researchers extract 96% of Harry Potter word-for-word from leading AI models

The study, titled โ€œExtracting Books from Production Language Models,โ€ used a technique called a “model extraction attack.” Researchers issued iterative prompts, starting with the first line of the book and asking the AI to “continue exactly,” to bypass safety filters and force the models to reveal their “memorized” training data.

Which Models Remember the Most?

The extraction rates varied significantly between commercial “frontier” models and open-weight systems. The research showed that while companies claim to have “unlearned” copyrighted data, the text remains deeply embedded in the models’ parameters.

AI ModelExtraction Rate (Book 1)Key Finding
Claude 3.7 Sonnet95.8%Almost the entire book was retrieved using “jailbreak” prompts.
Gemini 2.5 Pro76.8%Reproduced huge chunks without any jailbreaking required.
Grok 370.3%High recall of verbatim text, primarily from the first half of the book.
GPT-4.14.0%The most resistant; typically refused to continue after short excerpts.

How the Researchers Did It

The methodology was designed to be “conservative,” only counting long, contiguous strings of near-exact text.

  1. Phase 1: The Door Check: Researchers provided a real opening sentence (e.g., “Mr. and Mrs. Dursley, of number four, Privet Drive…”) and commanded the AI to continue word-for-word.
  2. Phase 2: The Loop: If the AI complied, the researchers repeatedly asked it to “continue” until it reached the end of its response limit or hit a safety refusal.
  3. The “Best-of-N” Jailbreak: For models that refused (like Claude and GPT), researchers tried hundreds of slightly altered promptsโ€”using different symbols or wordingsโ€”until one bypassed the “safety guardrail”.

Legal and Ethical Firestorm

The findings have “detonated” in the legal world, providing a “smoking gun” for authors and publishers currently suing AI firms for copyright infringement.

  • The “Copy” Argument: Legal scholars argue that if a model can reproduce a book at 96% fidelity, the model itself is not just “inspired” by the textโ€”it is effectively a compressed, illegal copy of the work.
  • Fair Use Defense: AI companies like OpenAI and Anthropic have long argued their training is “transformative” and protected by Fair Use. However, verbatim regurgitation of thousands of words is often seen as the opposite of transformation.
  • The “Unlearning” Myth: The study proves that “unlearning” techniquesโ€”where models are told to forget specific topicsโ€”are often just superficial layers that can be stripped away with clever prompting.

A Privacy Warning

The researchers warned that this isn’t just about wizards and magic. If an AI can memorize a book because it saw it multiple times on the web, it could also memorize sensitive personal data, private documents, or medical records if they were accidentally included in the massive datasets used for training.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Popular Articles