Original Article |
Corresponding author: Anna K. Wolber ( anna.wolber@meduniwien.ac.at ) Academic editor: Johann W. Bauer
© 2025 Anna K. Wolber, Harald Kittler.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY-NC 4.0), which permits to copy and distribute the article for non-commercial purposes, provided that the article is not altered or modified and the original author and source are credited.
Citation:
Wolber AK, Kittler H (2025) The footprint of AI-generated text in dermatology publications. SKINdeep 1: e153393. https://doi.org/10.1553/skindeep.2025.153393
|
Background: Large language models (LLMs) may help to diversify authorship in scientific journals by supporting non-English speaking researchers in writing, revising, and editing scientific papers.
Objectives: To quantify the frequency of AI-generated text in the dermatology literature and to relate these results to the geographic diversity of authorship.
Methods: We extracted abstracts of 4573 articles published in 21 dermatology journals in March 2024 from 2017 to 2024. We identified AI-generated content using an AI-detector and adjusted the raw rates to account for false positives. Additionally, we computed diversity indices to quantify temporal trends in the geographic distribution of the affiliations of first authors.
Results: We found that the raw rate of AI-generated abstracts remained relatively stable from 2018 to 2023 but exhibited a significant increase in March 2024. In March 2024, the raw rate of AI-generated abstracts was 33.8% (95% CI: 30.0% to 37.8%) and significantly higher than in any preceding year. After adjusting for false positives, the proportion of AI-generated abstracts remained relatively stable below 5% from 2018 to 2023 but jumped to 17.9% (95% CI: 14.9%–21.3%) in 2024. There was a positive correlation between the rate of AI-generated abstracts and the journal’s impact factor for articles published before 2024, with a correlation coefficient of 0.42 (95% CI: -0.01 to +0.72, P-value = 0.06). Regarding the geographic distribution of the affiliations of first authors, both the Shannon and Simpson diversity indices showed a decrease in 2024 compared to the baseline year of 2017.
Conclusions: Our data suggest that the year 2024 indicates a turning point in the use of AI-generated text in the dermatology literature, with the occurrence of AI-generated text increasing significantly compared to previous years. However, the increasing adoption of AI tools alone is not sufficient to enhance the diversity of scientific output in specialized fields such as dermatology.
Why was the study undertaken? This study was performed to quantify the frequency of AI-generated text in the dermatology literature and to relate these results to the geographic diversity of authorship.
What does this study add? This study shows that the prevalence of AI-generated texts in dermatological literature has increased significantly in 2024 but that this increase has not been accompanied by a greater geographic diversity of scientific contributions in the field.
What are the implications of this study for the understanding of skin physiology and pathology and/or disease management? The influence of scientific literature on funding priorities and drug development is profound and a lack of diversity impacts disease understanding and clinical care. Despite a historical peak in its usage, LLMs did not improve the visibility of countries traditionally underrepresented in the dermatology literature.
Artificial intelligence, dermatology, large language models
Over the last few years, the use of large language models (LLMs) such as GPT-4, LLama, and BERT has significantly influenced various aspects of academic medicine and scientific writing. These models understand and generate human-like text, assisting researchers in drafting research papers and summarizing complex data [
However, the use of these tools in academic work raises ethical concerns [
Understanding the dynamics of language barriers and their impact on scientific publishing in dermatology is essential for promoting inclusivity in the field. Although the application of LLMs in scientific writing is widely acknowledged, literature describing the prevalence of AI-generated text and its influence on the diversity of authorship remains absent. This study aims to quantify the prevalence of AI-generated text in the dermatology literature over time and examine its relationship with authorship diversity.
We extracted a list of scientific journals in the field of Dermatology from the Institute for Scientific Information (ISI) journal database and their 2022 impact factors. This master list included 93 journals. Then we selected only those journals with an impact factor greater than 2 (n=48) and formulated a PubMed/Medline query to retrieve articles with abstracts published in the month of March from 2017 to 2024. In a second filtering step, we excluded 27 journals that did not publish at least five articles with abstracts each March during the specified period. The refined list yielded 4964 abstracts from PubMed, including publication dates, authors’ names, and affiliations of the first authors. In a final step we removed articles with incomplete information and duplicates which resulted in 4573 abstracts.
We utilized GPTzero to detect potentially AI-generated content. GPTzero classifies the text into three distinct categories: AI-generated, human-generated, and mixed. For each year, we calculated the raw proportion of AI-generated abstracts and its 95% confidence interval. Recognizing that GPTzero can produce false positives, we established a baseline false positive rate of 15.8% by analyzing 533 abstracts from March 2017—a period prior to the widespread use of transformer architectures for generating AI content. For each year after 2017, we adjusted the observed counts of AI-generated abstracts by subtracting the expected number of false positives. The expected number of false positives was determined by sampling the false positive rate from a Beta distribution parameterized by the 2017 counts.
To quantify geographic diversity, we extracted the country of the affiliation of the first author. We applied two metrics: the Shannon diversity index (H) and Simpson’s index of diversity (D). The Shannon index evaluates both the abundance and evenness of countries represented and is given by the formula
where pi is the proportion of total abstracts that have their first author from the ith of R countries represented in the dataset.
The Simpson’s index places greater emphasis on the presence of a dominant country, estimating the likelihood that two randomly selected abstracts will have first authors from different countries and is given by the formula
where pi is the proportion of total abstracts that have their first author from the ith of R countries represented in the dataset.
The Pearson correlation coefficient was used to assess correlations and the Chi-Square test to compare proportions. All statistical analyzes were performed using R version 4.2.1, and graphs were created with ggplot2 version 3.5.0.
We found that the raw proportions of AI-generated abstracts remained relatively stable from 2018 to 2023 but exhibited a significant increase in March 2024, as shown in Table
Number of abstracts per year and count and frequencies of AI-generated abstracts.
Year | Abstracts (n) | AI-generated abstracts (n) | Raw frequency (%) and 95% CI intervals | Adjusted Frequency (%) and 95% CI intervals |
---|---|---|---|---|
2017 | 533 | 84 | 15.76 (12.83–19.20) | Not applicable |
2018 | 519 | 95 | 18.30 (15.13–21.96) | 2.49 (1.38–4.33) |
2019 | 553 | 98 | 17.72 (14.68–21.22) | 1.94 (1.02–3.58) |
2020 | 533 | 103 | 19.32 (16.11–22.99) | 3.44 (2.12–5.46) |
2021 | 610 | 102 | 16.72 (13.89–19.98) | 1.15 (0.51–2.47) |
2022 | 637 | 119 | 18.68 (15.77–21.98) | 2.82 (1.73–4.51) |
2023 | 602 | 92 | 15.28 (12.55–18.46) | 0.36 (0.07–1.38) |
2024 | 586 | 198 | 33.79 (29.99–37.80) | 17.87 (14.90–21.27) |
For articles published before 2024, there was a positive correlation between the raw rate of AI-generated abstracts and the journal’s impact factor with a correlation coefficient of 0.42 (95% CI: -0.01 to +0.72, P-value = 0.06). There was, however, no significant correlation between the frequency of potentially AI-generated abstracts and the journal’s impact factor for articles published in 2024, with a correlation coefficient of -0.03 (95% CI: -0.45 to +0.40, p-value = 0.90).
Next, we examined the geographic diversity of the publications by assessing the affiliations of the first authors. For this analysis, we excluded articles where the affiliation of the first authors could not be determined, resulting in a total of 4532 abstracts for evaluation. The affiliations of the first authors spanned 85 different countries, with the landscape dominated by the US (n=871, 19.2%) and China (n=533, 11.7%). The remaining 83 countries contributed 3128 (69.0%) of the articles. The adjusted rate of AI-generated abstracts was 18.4% for first authors from the US, 22.2% for first authors from China, and 19.3% for those from the other 83 countries (p = 0.11).
Next, we examined the trend over time concerning the distribution of countries and diversity. In 2017, publications from the US accounted for 21.0% of all articles, while those from China represented just 6.3%. The remaining countries (n=41) contributed 72.7% of publications. By 2024, however, the share of publications from the US decreased to 17.3%, whereas contributions from China rose to 21.3%. Publications from the other countries (n=46) decreased to 61.4% (Fig.
Our data suggest that 2024 marks a turning point in the use of AI-generated text within the dermatology literature. This year, we observed a sharp increase in the proportion of AI-generated abstracts in our field. Both the raw and adjusted counts showed significant rises compared to previous years. After accounting for false positives, we estimated that the probability of encountering AI-generated abstracts in the dermatology literature is around 18%, a significant increase from the rate of well below 5% observed before 2024. Several developments in 2023 likely contributed to this increase in the use of LLMs in academic writing. Notably, in 2023 there was a high density of new releases and updates regarding LLMs. In February 2023, Meta released LLaMA (Large Language Model Meta AI), OpenAI launched GPT-4, and in May, Google released PALM-2 (Pathways Language Model 2). Not only has the range and quality of LLMs improved, but the threshold for using these models among researchers may have decreased. This shift can be attributed to the increasing familiarity with LLMs and the decision by most academic journals to provide clear guidelines for their use rather than imposing outright bans.
Despite the increased usage of LLMs, there are only a limited number of studies quantifying their usage in medical scientific literature. Bisi et al. compared the rate of AI-generated text before and after the release of ChatGPT in November 2022 in a single orthopaedic journal. They found an increase in AI-generated text from 10.3% to 15.6%. However, they also noted that AI-generated text was primarily detected in abstracts. This finding led us to focus our analysis on abstracts only [
The potential advantages of LLMS in medical writing are evident. They may assist researchers by drafting sections of their papers and improving productivity, and they may enhance language quality, grammar, and style, especially for non-native English speakers. Therefore, LLMs can play a crucial role in diversifying authorship in scientific journals and may help to address global inequalities in scientific publishing. As scientific literature typically reflects the predominant research focus in the field, a lack of diversity among regions can skew our understanding of skin diseases and affect clinical care. This issue is particularly pronounced in the field of dermatology, where certain skin diseases that are more prevalent in underrepresented countries do not receive the attention they deserve. Consequently, research on these underrepresented skin diseases often faces lower levels of funding and interest. This lack of attention can delay or even prevent the development of new treatments and drugs, exacerbating health disparities. Our analysis suggests that the use of LLMs alone is not sufficient to enhance the diversity of scientific output in dermatology. Instead, it may even widen the gap between well-funded, technologically advanced countries and those with fewer resources. Instead of an increase in diversity, we found a shift in the landscape of publication dominance. Originally dominated by the US, there is now a bipartite dominance that includes China, which may have benefited from LLMs supporting non-English speaking authors. However, the share of publications from other countries has remained stable and the overall diversity did not increase but rather decline. Interestingly, we observed an increase in diversity in scientific publications during the COVID-19 pandemic, reflecting the global perspective and widespread impact of this disease. This trend highlights how a global health crisis can bring attention to diverse perspectives and regions that are typically underrepresented. However, our data suggest that the use of LLMs did not play a significant role during the COVID-19 pandemic.
It is important to note that LLMs may also introduce certain negative effects. Currently, AI systems primarily work with freely accessible databases, which excludes subscription journals. This limitation can lead to an overemphasis on publicly accessible abstracts and data, potentially skewing the representation of available research. Moreover, LLMs have limited ability to verify the accuracy of sources, which can result in the inclusion of incorrect or fabricated data. Therefore, it is crucial to ensure that LLMs do not misinterpret or misrepresent scientific data when utilizing these tools. Despite their potential to diversify scientific publishing and attract a broader range of researchers, LLMs might also negatively impact scientific writing. The concern is that scientific writing could become sterile and uniform, lacking the idiosyncratic features that make reading enjoyable.
Our study has several limitations. First, we used an AI-detector to identify AI-generated text, which may produce false positives. To mitigate this effect, we used the year 2017, a period prior to the widespread availability of LLMs, to set a baseline for false positives and adjusted the rate of detected AI-generated text accordingly. We did not use human reviewers because it has been shown that researchers themselves are often unable to distinguish between LLM-generated and human-written texts. Second, we analyzed abstracts from dermatology literature only, rather than scanning entire papers. The proportion of AI-generated text may be much lower in full articles compared to abstracts. Third, our analysis focused on abstracts from March of each year, which excluded some quarterly dermatologic journals. Finally, it is important to emphasize that the detection of AI-generated text does not disqualify the scientific content of an article or the originality of its authors. LLMs may have been used solely for text polishing and improvement of grammar and style, which, in our opinion, is entirely acceptable.
HK received nonfinancial support from Derma Medical Systems, Fotofinder and Heine, and speaker fees from Fotofinder.
The authors declared that no clinical trials were used in the present study.
The authors declared that no experiments on humans or human tissues were performed for the present study.
The authors declared that no informed consent was obtained from the humans, donors or donors’ representatives participating in the study.
The authors declared that no experiments on animals were performed for the present study.
The authors declared that no commercially available immortalised human and animal cell lines were used in the present study.
No funding was reported.
Anna K. Wolber: Writing- draft preparation, Discussion, Revision; Harald Kittler: Data curation, Conceptualization, Methodology, Supervision.
Anna K. Wolber https://orcid.org/0009-0004-7011-7623
Harald Kittler https://orcid.org/0000-0002-0051-8016
The data that support the findings of this study are available from the corresponding author.