AI systems exhibit predictable and systematic biases when judging people, according to a recent study from Hebrew University published in the Proceedings of the Royal Society. The research analyzed over 43,000 simulated decisions alongside approximately 1,000 human participants across five scenarios, revealing significant differences between human and AI evaluations.
The scenarios included financial decisions like lending money to a small business owner, as well as social judgments such as assessing a babysitter or deciding how much to donate to a nonprofit founder. The findings indicate that while both humans and AI favored individuals perceived as competent, honest, and well-intentioned, machines operate using rigid evaluation criteria rather than forming holistic impressions.
Prof. Yaniv Dover remarked, “AI is not making random decisions. It captures something real about how humans evaluate one another.” However, the study noted that humans tend to create general impressions based on multiple traits, while AI assesses traits such as competence and integrity in a more segmented manner. Valeria Lerman further explained, “People in our study are messy and holistic in how they judge others. AI is cleaner, more systematic, and that can lead to very different outcomes.”
Stay Ahead of the Curve!
Don’t miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.
Subscribe Now
Differences in evaluation outcomes were apparent even when identical details about subjects were used. The researchers found that AI’s biases could be more systematic and sometimes stronger than those of humans. In financial scenarios, AI systems displayed consistent biases favoring older individuals and were influenced by aspects like religion and gender. “Humans have biases, of course,” said Dover. “But what surprised us is that AI’s biases can be more systematic, more predictable, and sometimes stronger.”
The study revealed that varied AI models often produce different judgments about the same individual. This variability underscores the importance of AI model selection in determining real-world outcomes. “Which model you use really matters,” Lerman noted. Currently, large language models are employed for job candidate screening, creditworthiness assessment, and other decision-making roles.
While AI mimics human judgment processes, the researchers found it less nuanced and more rigid, often with biases that are harder to identify. “These systems are powerful,” Dover emphasized. “They can model aspects of human reasoning in a consistent way. But they are not human, and we should not assume they see people the way we do.”
The researchers advocate for a deeper understanding of AI’s evaluation processes as these tools evolve from assistants to decision-makers. They clarify that the goal is not to discourage the use of AI but to raise awareness regarding how these systems trust individuals. “The question is no longer whether we trust machines; it is whether we understand how they trust us,” the study concludes.
Featured image credit
