Unfounded generalizations and false conclusions: RUDN scientists have identified AI “hallucinations” in the diagnosis of mental disorders

Research and Innovation

Newsletter

2025

21 Apr

Rubric:

Research and Innovative Activity

Release/number:

№ 4

Researchers from the Faculty of Artificial Intelligence at RUDN University conducted a large-scale study that revealed systemic errors in large language models (LLMs) when diagnosing depression based on text. This work, carried out in collaboration with colleagues from AIRI, Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, Ivannikov Institute for System Programming of the Russian Academy of Sciences, Moscow Institute of Physics and Technology, and MBZUAI, not only identifies the problem but also lays the foundation for the creation of more reliable and secure tools for detecting depression and anxiety.

“Our research is an important step towards trusted AI in medicine. We don't just point out the shortcomings of AI tools, we offer approaches to overcoming them. The key task today is not blind trust in algorithms, but their integration into the work of doctors as a proven and understandable decision-making support tool. Patient safety and understanding the limitations of technology are our absolute priorities,” said Anton Poddubsky, Dean of the Faculty of Artificial Intelligence.

The main value of the study is a detailed comparison of existing large language models (LLMs), as well as methods for their use and retraining for the tasks of detecting depression and anxiety in text, and an analysis of AI errors and “hallucinations” in these tasks with the involvement of experts in the field of psychology. The work of RUDN scientists has been recognized and presented at the highly rated international conference Empirical Methods in Natural Language Processing (EMNLP). We spoke with the authors of the article and learned how the idea for the work came about, what AI “hallucinations” were identified, and what the prospects are for further research.

How did the idea for research on this topic arise, and why is it relevant and pivotal?

In recent years, there has been growing interest in diagnosing mental states based on text and using AI in this field, as well as applying LLM in medicine in general. However, most studies rely on English-language data and ML models; there have been no comprehensive comparisons for the Russian language to date. This prompted us to investigate LLM and other machine learning models for detecting depression and anxiety from text. We compared different models for both tasks and showed which ones work best in each case. In addition, we conducted additional experiments to evaluate the quality of LLM generation from the perspective of expert psychologists. It turned out that at the current stage, LLMs provide low-quality answers. In particular, in one experiment, we used LLM not only to determine the presence or absence of depression in the author of the text, but also to generate an explanation of why the model came to that conclusion. It was in this experiment that we found that the explanations of modern models contain a large number of errors from an expert point of view.

What is the main danger of such errors?

The danger lies in the fact that LLMs can produce unfounded or false conclusions (“hallucinations”) that appear plausible to the end user. Such errors are difficult to detect without the help of an expert, but they can lead to misinterpretation of signs of depression.

What causes of AI errors have you identified? What is it about conversations about mental health that is so “confusing” even for the most advanced language models?

Clinical psychologists analyzed LLM's responses and noted errors in them from an expert point of view. We identified six main types of errors: tautology, unfounded generalizations, false conclusions, confabulations, distortion of medical concepts of depression, and incomplete listing of its symptoms. It is worth noting that from a machine learning perspective, all these errors can be described as “hallucinations,” but in psychology-related tasks, a more precise categorization is needed. The peculiarity of the texts used to identify depression is related to the complexity of their interpretation. People often describe their condition indirectly, using metaphors, and the text does not always directly reflect the signs of mental disorders. In addition, the task of detecting depression from text is difficult for non-specialized models, since most of them have not been trained on psychological or medical data.

What are the prospects for further development of this research?

The next step could be specialized LLM retraining on large datasets for the purpose of detecting depression and anxiety. The current experiments used a relatively small amount of data, which may have limited the final quality of the models.

118

15 Dec 2018

RUDN University international Projects

2236

12 Dec 2024

Broadening Scientific Horizons: For the Third Time RUDN University Became an Expert Platform for SDG-dialogues

From 19 to 23 November 2024, RUDN hosted the III International Scientific Conference ‘For the Sustainable Development of Civilisation: Cooperation, Science, Education, Technology’. The event gathered more than 2000 participants from 72 countries.

1230

21 Apr

Building a sustainable future: what are SDGs and how RUDN helps achieve them

Imagine a world where everyone has enough food, clean water, access to education, and decent work. A world where nature is protected and the future of our planet is cared for. These are the Sustainable Development Goals—to achieve a sustainable future for all! To this end, in 2015, the United Nations (UN) defined 17 Sustainable Development Goals (SDGs). The SDGs are a global plan that helps countries and people work together towards a better future. All 193 UN member states have joined the plan.

115

21 Apr

Rats and neurodegenerative processes: a junior researcher at RUDN University wins Academician A. P. Avtsyn Award

Alexandra Sentyabreva, a junior researcher at the Laboratory of Cell Technologies and Tissue Engineering at RUDN Research Institute of Molecular and Cellular Medicine at the Russian University of People's Friendship, won the competition for young scientists at the All-Russian Scientific Conference “Topical Issues of Morphogenesis in Norm and Pathology.” She was awarded the Academician A.P. Avtsyn Prize.

28 Nov 2025

To chip the placenta. RUDN University researcher wins a competition for young scientists with a cell model

The project to develop a cellular model of the placenta became the winner in the Scientific Materials category of the Young Scientists 3.0 competition, organized with the support of the Presidential Grants Foundation and T-Bank.

462

15 Dec 2018

RUDN University international Projects

2236

12 Dec 2024

Broadening Scientific Horizons: For the Third Time RUDN University Became an Expert Platform for SDG-dialogues

1230

21 Apr

Building a sustainable future: what are SDGs and how RUDN helps achieve them

115

21 Apr

Rats and neurodegenerative processes: a junior researcher at RUDN University wins Academician A. P. Avtsyn Award

28 Nov 2025

To chip the placenta. RUDN University researcher wins a competition for young scientists with a cell model

462