Temperature is not a remedy
A reflexive objection from practitioners familiar with LLM configuration holds that increasing sampling temperature would attenuate these distributional biases by flattening the probability landscape from which characters are drawn. Irregular’s empirical results are unambiguous in refuting this intuition. Testing conducted at temperature 1.0, the maximum setting on Claude, produces no statistically meaningful improvement in effective entropy. The character-position biases are encoded in model weights, not in sampling parameters, and temperature modulation operates downstream of those weight-instantiated distributions.
Separately, Kaspersky’s Data Science Team Lead Alexey Antonov conducted a complementary investigation analyzing 1,000 passwords generated by ChatGPT, Meta’s Llama, and DeepSeek. The character-frequency histograms disclosed pronounced non-uniformity across all three models: ChatGPT exhibits a systematic preference for the characters x, p, and L; Llama for the hash symbol and the letter p; DeepSeek for t and w. At temperature 0.0, Claude produces the identical string on every invocation. These findings are consistent across different model families and measurement methodologies, corroborating the structural rather than incidental nature of the vulnerability.
The practical corollary is that an adversary who has identified the LLM used to generate a target credential need not attempt exhaustive brute-force against a 94^16 keyspace. They can construct a model-specific attack dictionary, ordering candidates by their empirical generation frequency, and execute a probabilistically optimized search against a keyspace several orders of magnitude smaller. Kaspersky’s cracking tests found that 88 percent of DeepSeek passwords and 87 percent of Llama passwords failed to withstand targeted attack, as did 33 percent of ChatGPT passwords, all using standard GPU hardware.
