Study Shows X Junk Data Makes LLMs Lose Reasoning Skills

A recent research effort has revealed that large language models (LLMs) lose reasoning skills when continually trained on large volumes of low-quality or highly engaged content from X. The effect is dubbed “brain rot” for LLMs.

What the researchers found

The team created controlled experiments using data from X. They defined two “junk data” categories: one based on high engagement short posts, another based on low semantic quality texts.
They found that as the proportion of junk data rose, model performance on reasoning benchmarks dropped sharply. For example, accuracy on the ARC-Challenge benchmark fell from ~74.9% down to ~57.2% when going from 0% to 100% junk data.
The main error mode was “thought-skipping” — the models increasingly skipped reasoning steps or failed to chain logic appropriately. THE DECODER
Attempts to recover lost reasoning by adding clean data or instruction tuning helped somewhat but did not restore full baseline performance.

Why the data from X is especially problematic

The “junk” category included posts short in length, highly engaged (lots of likes / retweets), or click-bait style — typical of X’s social-media dynamics.
These posts tend to emphasise engagement patterns over content depth—creating lots of noise and low semantic value.
Because LLMs ingest huge volumes of web/scraped social-media data, this drift of content quality appears to have broad implications.

Broader context: LLM reasoning was fragile anyway

Prior to this study, other research already flagged that LLMs’ reasoning abilities were weaker than assumed. For instance:

A study from MIT CSAIL found LLMs did much worse when tasks were slightly modified or placed “out-of-domain”.
Other research shows that LLMs may rely more on pattern-matching than deep logical inference.

The X-data study adds a causal data-quality dimension to the reasoning-loss narrative.

Implications: Why This Matters

For AI developers & model training

Data curation is critical: Training on high-engagement social media content alone can degrade model performance in reasoning tasks.
Continual pre-training matters: As models are updated with fresh data, injecting low-quality data can gradually erode performance — a “cumulative harm”. The Financial Express
Monitoring model “cognitive health”: The study suggests periodically testing deployed LLMs for reasoning decline, especially if training data shifts over time.

For applications / users

Models doing logic-heavy tasks (legal reasoning, scientific inference, long-context understanding) may be more at risk of failure if trained on noisy social-media-style corpora.
This raises caution for high-stakes domains: even if an LLM appears fluent, its reasoning chains may be brittle or missing.

For the broader AI ecosystem

Encourages a rethink of the “dump everything” web-scrape model: not all data is equally useful — in fact, some may harm capabilities.
Raises questions about ethics & safety: reasoning failure modes are not just accuracy issues—they could lead to flawed decisions.
Points to long-term structural risks: if models drift downward because of low-quality data, future improvements may require more than just scaling.

What’s Next: Open Questions & Research Directions

How reversible is the decline? The study showed partial recovery, but full restoration seemed elusive. What methods will fully repair reasoning skills?
Which kinds of “junk” data are worst? The study used engagement‐driven and low semantic‐value texts. Are there other categories (misinformation, highly opinionated posts, etc.) that cause more harm?
How universal is the effect? The experiments were controlled and limited in scale. Will very large models (100B+ parameters) be immune or less affected?
What architecture/training changes mitigate this? Better filtering, fine‐tuning strategies, curriculum learning?
Impact on multilingual / domain‐specific models: Does reasoning decay differ across languages, domains (medical/legal) or speciality models?

Final Thoughts

The research shows that yes — data from X (social-media-engagement-driven content) is contributing to LLMs losing reasoning skills. It’s a stark reminder that in AI, more data isn’t always better data. As models continue being trained and updated, the quality, relevance and structure of training data become crucial to maintaining reasoning robustness. For anyone building or deploying LLMs, monitoring for reasoning decline should be part of the process.

Lapaas Voice

Subscribe to newsletter

Startup

Artificial Intelligence

Funding

Case Studies

Lapaas Voice

Startup

Artificial Intelligence

Funding

Case Studies

Lapaas Voice

Trending

Related Posts

Study Shows X Junk Data Makes LLMs Lose Reasoning Skills

What the researchers found

Why the data from X is especially problematic

Broader context: LLM reasoning was fragile anyway

Implications: Why This Matters

For AI developers & model training

For applications / users

For the broader AI ecosystem

What’s Next: Open Questions & Research Directions

Final Thoughts

LEAVE A REPLY Cancel reply

Popular Articles

Lapaas Voice

About us

Latest Articles

Most Popular

Subscribe