Thinking Machines Lab, founded by former OpenAI CTO Mira Murati, claims it has solved one of the most difficult problems in artificial intelligence: LLM nondeterminism. This issue occurs when large language models (LLMs) produce different answers to the same prompt, making them unreliable in sensitive use cases like healthcare, law, and finance. The lab’s new approach, which focuses on batch invariance during inference, could provide a practical LLM nondeterminism fix for AI developers worldwide.
What Is LLM Nondeterminism?
LLM nondeterminism refers to the inconsistent outputs generated by models such as GPT or Qwen when they process the same input multiple times. This happens due to:
- Floating-point math differences across GPUs
- Parallel processing and batch ordering variations
- Kernel computation randomness in deep learning frameworks
For end users, this means the same question may yield slightly different or even conflicting answers—a serious obstacle for building trustworthy AI applications.
The Breakthrough: Batch Invariance
Thinking Machines Lab believes the core issue lies in the lack of batch invariance.
- Normally, LLMs process data in “batches.” When the batch size or order changes, the mathematical operations inside the model can produce slightly different results.
- These tiny numerical differences then amplify layer by layer, leading to visible output variations.
- The new solution enforces strict determinism in these kernel operations, ensuring that results remain identical no matter how data is batched or ordered.
In tests on Qwen3-235B, the lab reported 100% reproducible results across repeated runs with the same input—something never achieved before at this scale.
Why This Matters
A working LLM nondeterminism fix could reshape AI development in several ways:
- Reproducibility for Research – Scientists can now reliably compare models and results without noise.
- Reliability in Production – Industries like finance, medicine, and law require consistent outputs for compliance and safety.
- Better Model Training – Reinforcement learning and alignment methods depend on stable feedback signals; reducing randomness can improve efficiency.
- User Trust – People expect the same question to yield the same answer. Deterministic AI could improve adoption rates.
Remaining Questions
While the announcement is promising, several concerns remain:
- Performance trade-offs: Enforcing determinism might reduce inference speed.
- Generalization: It’s unclear if the method works across all model sizes, architectures, and hardware setups.
- Random sampling: If users enable “temperature” or probabilistic sampling, nondeterminism may still occur.
- Peer review needed: Independent researchers must validate the claims.
Industry Context
- AI companies like OpenAI, Anthropic, and Google DeepMind have all acknowledged nondeterminism as a persistent problem in scaling LLMs.
- Regulators in the EU and U.S. have also raised concerns about AI reproducibility and auditability.
- If Thinking Machines Lab’s approach holds up, it could become a new industry standard for safe AI deployment.
Conclusion
Thinking Machines Lab’s proposed LLM nondeterminism fix represents a potentially historic step in artificial intelligence development. By addressing batch invariance and achieving reproducible results, the lab may have solved a problem that has plagued LLMs for years. If validated, this breakthrough could unlock safer, more reliable AI systems for critical industries worldwide.