The Zero-Temperature Lie: Why Your Deterministic LLM is Still Hallucinating Randomness

Deterministic LLM inference sounds simple: set temperature to zero and get consistent answers. But reality is messier, and asking the same question a thousand times might yield dozens of different responses.

While GPU parallelism and floating-point arithmetic play a role, a study by Thinking Machine Labs identifies the real culprit as… other users. Concurrent requests to the same server change the batch size, which silently alters the order of transformer computations, making the output look like uncontrollable randomness.

This talk will debunk the common myths about LLM nondeterminism, walk through the actual mechanics of inference engines, and explain what it would take to achieve true reproducibility.

Whether calling an external API or running a self-hosted inference stack, attendees will leave with a clear understanding of why this happens, which strategies can help address it, and how to think about reproducibility in AI systems.

Delivered at PyConIT 2026

Valeria Zuccoli

Data Scientist

Milan, Italy

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

The Zero-Temperature Lie: Why Your Deterministic LLM is Still Hallucinating Randomness

Valeria Zuccoli

Links

Actions