Ai models show gender-based bias in risk decisions, study reveals behavioral differences

AI Models Show Gender-Influenced Risk Behavior, Study Finds

Artificial intelligence systems are increasingly being used in decision-making roles, from finance to healthcare. But a new study suggests these systems may not be as neutral as previously assumed—especially when their responses are influenced by gender-based prompts.

Researchers at Allameh Tabataba’i University in Tehran, Iran, have discovered that large language models (LLMs) significantly adapt their approach to risk depending on whether they are instructed to behave like a man or a woman. The findings point to an intriguing—and potentially problematic—tendency: when told to adopt a female persona, most AI models become more risk-averse, while adopting a male identity encourages bolder, more risk-tolerant decision-making.

The study involved testing prominent AI models developed by companies such as OpenAI, Google, Meta, and DeepSeek. Researchers provided the models with hypothetical financial decision-making scenarios, then prompted them to respond as if they were either male or female. The results revealed a consistent pattern across most tested systems: gender identity prompts led to notable changes in risk preferences.

Among the models tested, DeepSeek Reasoner and Google’s Gemini 2.0 Flash-Lite exhibited the most significant shifts in behavior. These models showed a marked decrease in risk appetite when asked to think as women, whereas they displayed increased willingness to take risks when told to assume a male perspective.

This behavioral mirroring raises important questions about the way language models are trained. LLMs are developed using vast datasets compiled from the internet, which may contain implicit gender biases that shape the models’ responses. When prompted to “act like a woman,” the models may be drawing on stereotypical representations of femininity and cautiousness encoded in their training data.

While these findings are not entirely surprising—given that societal norms often associate risk-taking with masculinity and caution with femininity—their replication in AI systems is cause for concern. AI is increasingly being integrated into environments where neutrality and fairness are critical, such as hiring, lending, and medical diagnostics. If gender-based prompts can shift an AI’s decision-making framework, this could reinforce or even amplify existing human biases.

The implications go beyond theoretical. In financial sectors, for example, automated decision-making tools may provide different recommendations depending on how they’re prompted, potentially affecting portfolio management or credit lending decisions. This could result in unequal treatment if such systems are unknowingly influenced by gender-coded prompts or user inputs.

In practical terms, this means developers and regulators need to be more vigilant. AI systems should not only be tested for performance and safety, but also for behavioral consistency across different identity prompts. Transparency in training data and prompt engineering practices will be critical in ensuring these systems do not perpetuate social biases.

Moreover, the research reinforces the importance of prompt design in AI interactions. Since LLMs are sensitive to subtle linguistic cues, even slight variations in phrasing can significantly alter their output. This highlights the ethical responsibility of developers and users to understand how prompts may shape model behavior.

To mitigate these risks, AI training processes may need to incorporate bias correction mechanisms and more diverse data sources. Additionally, implementing identity-neutral defaults and guardrails could help prevent models from adopting stereotypical thinking patterns, especially in high-stakes applications.

The study also opens up new research avenues. Future investigations may explore how other aspects of identity—such as age, race, or nationality—affect AI decision-making. There’s also the question of whether different LLM architectures (e.g., transformer-based models vs. retrieval-augmented systems) exhibit varying degrees of sensitivity to identity cues.

Importantly, this work underscores that AI does not operate in a vacuum. It reflects the cultural, historical, and linguistic patterns embedded in the data it consumes. As such, the AI community must strive to understand and address these reflections, ensuring that models serve as tools for fairer, more equitable decision-making, rather than reinforcing outdated norms.

In sum, the research from Allameh Tabataba’i University serves as a wake-up call. As AI becomes more ingrained in decision-making processes, understanding how subtle factors like gendered prompts affect its behavior is essential. Only by recognizing and addressing these influences can we build systems that are not only intelligent but also just and impartial.