Home All Categories-en News AI Cannot Identify Retracted Papers

AI Cannot Identify Retracted Papers

0
AI Cannot Identify Retracted Papers

Retracted papers are among the most drastic and visible warning signs used to protect research integrity in the scientific literature. However, a new study published in Retraction Watch on November 19, 2025, shows that the rapidly proliferating AI chatbots are having a particularly difficult time recognizing these critical warning signs. The researchers warn that academics, particularly those relying on ChatGPT and similar tools, risk serious errors if they use the responses of these models as an “automatic truth filter” (https://retractionwatch.com/2025/11/19/ai-unreliable-identifying-retracted-research-papers-study/).

Konradin Metze and his team at the State University of Campinas, who conducted the study, designed a relatively simple experiment. They presented a list of publications by Joachim Boldt, known for his major scientific fraud scandal in anesthesiology, to 21 different AIs. The list included the most cited retracted Boldt articles, the most cited Boldt publications that had not been retracted, as well as articles written by other authors with the last name Boldt. For each of the 132 references, the bots were asked a single question: Was this article retracted or not?

The results were striking. Most of the chatbots correctly identified less than half of the retracted articles. Not only did they miss them, but they also incorrectly marked a significant portion of the unretracted articles as retracted. This represents a serious weakness in both sensitivity and specificity: the AI provides false assurances and casts unnecessary doubt on well-established articles.

When the research team repeated part of the experiment three months later, they encountered an even more striking pattern. In the first round, the bots generally used definitive statements, but in the second round, they began using vague and evasive phrases, such as “possibly retracted” or “requires further review.” The researchers interpret this shift as the models oscillating between “offering false certainty” and “trying to save themselves with vague statements.”

The Retraction Watch report also cites another recent study by Mike Thelwall of the University of Sheffield. Thelwall had ChatGPT evaluate 217 retracted or seriously questioned papers 6,510 times. In none of these thousands of responses did ChatGPT indicate that the paper had been retracted, raise questions about it, or contain scientific issues. On the contrary, it even praised some retracted papers as “high-quality work.” This demonstrates that AI not only misses retraction information but can also glorify and reproduce erroneous or false scientific findings (https://sheffield.ac.uk/ijc/news/new-research-suggests-chatgpt-ignores-article-retractions-and-errors-when-used-inform-literature?utm_source=chatgpt.com).

The problem isn’t just recognition. Another study published in the Journal of Advanced Research revealed that chatbots use retracted articles as sources in their responses. This means that AI can now recirculate information that was previously considered obsolete in the scientific literature. As more and more people in academia use tools like ChatGPT to quickly summarize, develop research ideas, or master the literature, the risk of recirculating retracted information becomes increasingly significant.

Sociologist of science Serge Horbach calls these developments a “clear warning”: LLM models are not suitable tools for weeding out retracted articles. The training data for AI models is fed by a system that is both historically lagging and where retraction information is published in a fragmented manner. Information about an article’s retraction may be visible only on the journal page, only in PubMed, or only in the Retraction Watch database. Scanning this fragmented structure with security and accuracy is far beyond the technical capabilities of today’s chatbots.

For Academic Solidarity, these findings hold particular significance for academics in exile or working in precarious circumstances. In situations where access to research infrastructure is limited, tools like ChatGPT offer attractive speed and convenience. However, this convenience carries the risk of unnoticed reproduction of studies based on retracted or inaccurate information. This risk can be even more severe for researchers working in political, legal, or human rights fields; misinformation cannot only be a scientific error but also open the door to political manipulation.

This situation doesn’t necessarily mean AI should be completely excluded from research processes; however, it does highlight a critical limitation: ChatGPT and similar models are not reliable filters for detecting retracted literature.