Original Article
Print
Original Article
The footprint of AI-generated text in dermatology publications
expand article infoAnna K. Wolber, Harald Kittler
‡ Medical University of Vienna, Vienna, Austria
Open Access

Abstract

Background: Large language models (LLMs) may help to diversify authorship in scientific journals by supporting non-English speaking researchers in writing, revising, and editing scientific papers.

Objectives: To quantify the frequency of AI-generated text in the dermatology literature and to relate these results to the geographic diversity of authorship.

Methods: We extracted abstracts of 4573 articles published in 21 dermatology journals in March 2024 from 2017 to 2024. We identified AI-generated content using an AI-detector and adjusted the raw rates to account for false positives. Additionally, we computed diversity indices to quantify temporal trends in the geographic distribution of the affiliations of first authors.

Results: We found that the raw rate of AI-generated abstracts remained relatively stable from 2018 to 2023 but exhibited a significant increase in March 2024. In March 2024, the raw rate of AI-generated abstracts was 33.8% (95% CI: 30.0% to 37.8%) and significantly higher than in any preceding year. After adjusting for false positives, the proportion of AI-generated abstracts remained relatively stable below 5% from 2018 to 2023 but jumped to 17.9% (95% CI: 14.9%–21.3%) in 2024. There was a positive correlation between the rate of AI-generated abstracts and the journal’s impact factor for articles published before 2024, with a correlation coefficient of 0.42 (95% CI: -0.01 to +0.72, P-value = 0.06). Regarding the geographic distribution of the affiliations of first authors, both the Shannon and Simpson diversity indices showed a decrease in 2024 compared to the baseline year of 2017.

Conclusions: Our data suggest that the year 2024 indicates a turning point in the use of AI-generated text in the dermatology literature, with the occurrence of AI-generated text increasing significantly compared to previous years. However, the increasing adoption of AI tools alone is not sufficient to enhance the diversity of scientific output in specialized fields such as dermatology.

Why was the study undertaken? This study was performed to quantify the frequency of AI-generated text in the dermatology literature and to relate these results to the geographic diversity of authorship.

What does this study add? This study shows that the prevalence of AI-generated texts in dermatological literature has increased significantly in 2024 but that this increase has not been accompanied by a greater geographic diversity of scientific contributions in the field.

What are the implications of this study for the understanding of skin physiology and pathology and/or disease management? The influence of scientific literature on funding priorities and drug development is profound and a lack of diversity impacts disease understanding and clinical care. Despite a historical peak in its usage, LLMs did not improve the visibility of countries traditionally underrepresented in the dermatology literature.

Key words:

Artificial intelligence, dermatology, large language models

Introduction

Over the last few years, the use of large language models (LLMs) such as GPT-4, LLama, and BERT has significantly influenced various aspects of academic medicine and scientific writing. These models understand and generate human-like text, assisting researchers in drafting research papers and summarizing complex data [1]. In the current scientific landscape, English dominates as the primary language of communication. Consequently, most scientific papers are written in English, leading to challenges for non-native speakers who may struggle with clarity and complexity in their writing. This situation contributes to global inequalities, particularly in countries with limited English education, resulting in these researchers having fewer opportunities to get their papers accepted in top-tier journals [2]. LLMs can help diversify authorship in scientific journals by supporting researchers for whom English is not the first language, aiding in writing, revising, proofreading, and editing to improve readability [2–4].

However, the use of these tools in academic work raises ethical concerns [1], and risks such as errors, AI-inherent bias, and incorrect quotations are significant [5]. Additional limitations include a lack of transparency and proper disclosure [6]. Current policies of major dermatology journals like the Journal of the American Academy of Dermatology (JAAD), the Journal of the American Medical Association Dermatology (JAMA), the British Journal for Dermatology (BJD), the Journal of the European Academy of Dermatology (JEADV) [7], and the Journal of Investigative Dermatology (JID) [8] concur that AI tools cannot be named as authors [9–11]. These tools do not meet the ICMJE authorship criteria, as they cannot disclose conflicts of interest or handle copyright and licensing agreements. Human involvement remains crucial due to the unique responsibilities involved in authorship [12]. Nevertheless, the use of AI and AI-assisted technologies is permitted in the writing process, provided their use is disclosed.

Understanding the dynamics of language barriers and their impact on scientific publishing in dermatology is essential for promoting inclusivity in the field. Although the application of LLMs in scientific writing is widely acknowledged, literature describing the prevalence of AI-generated text and its influence on the diversity of authorship remains absent. This study aims to quantify the prevalence of AI-generated text in the dermatology literature over time and examine its relationship with authorship diversity.

Methods

Selection of journals and extraction of articles

We extracted a list of scientific journals in the field of Dermatology from the Institute for Scientific Information (ISI) journal database and their 2022 impact factors. This master list included 93 journals. Then we selected only those journals with an impact factor greater than 2 (n=48) and formulated a PubMed/Medline query to retrieve articles with abstracts published in the month of March from 2017 to 2024. In a second filtering step, we excluded 27 journals that did not publish at least five articles with abstracts each March during the specified period. The refined list yielded 4964 abstracts from PubMed, including publication dates, authors’ names, and affiliations of the first authors. In a final step we removed articles with incomplete information and duplicates which resulted in 4573 abstracts.

Identification of potentially AI-generated abstracts

We utilized GPTzero to detect potentially AI-generated content. GPTzero classifies the text into three distinct categories: AI-generated, human-generated, and mixed. For each year, we calculated the raw proportion of AI-generated abstracts and its 95% confidence interval. Recognizing that GPTzero can produce false positives, we established a baseline false positive rate of 15.8% by analyzing 533 abstracts from March 2017—a period prior to the widespread use of transformer architectures for generating AI content. For each year after 2017, we adjusted the observed counts of AI-generated abstracts by subtracting the expected number of false positives. The expected number of false positives was determined by sampling the false positive rate from a Beta distribution parameterized by the 2017 counts.

Geographic diversity of affiliations of first authors

To quantify geographic diversity, we extracted the country of the affiliation of the first author. We applied two metrics: the Shannon diversity index (H) and Simpson’s index of diversity (D). The Shannon index evaluates both the abundance and evenness of countries represented and is given by the formula

H=-i=1Rpilog(pi)

where pi is the proportion of total abstracts that have their first author from the ith of R countries represented in the dataset.

The Simpson’s index places greater emphasis on the presence of a dominant country, estimating the likelihood that two randomly selected abstracts will have first authors from different countries and is given by the formula

D=1-i=1Rpi2

where pi is the proportion of total abstracts that have their first author from the ith of R countries represented in the dataset.

Statistical analysis

The Pearson correlation coefficient was used to assess correlations and the Chi-Square test to compare proportions. All statistical analyzes were performed using R version 4.2.1, and graphs were created with ggplot2 version 3.5.0.

Results

Frequency of AI-generated abstracts

We found that the raw proportions of AI-generated abstracts remained relatively stable from 2018 to 2023 but exhibited a significant increase in March 2024, as shown in Table 1 and in Fig. 1. In 2024, the raw rate of AI-generated abstracts was 33.8% (95% CI: 30.0% to 37.8%, Fig. 1A) and 17.9 (14.9–21.3) after adjustment for false positives (Fig. 1B).

Table 1.

Number of abstracts per year and count and frequencies of AI-generated abstracts.

Year Abstracts (n) AI-generated abstracts (n) Raw frequency (%) and 95% CI intervals Adjusted Frequency (%) and 95% CI intervals
2017 533 84 15.76 (12.83–19.20) Not applicable
2018 519 95 18.30 (15.13–21.96) 2.49 (1.38–4.33)
2019 553 98 17.72 (14.68–21.22) 1.94 (1.02–3.58)
2020 533 103 19.32 (16.11–22.99) 3.44 (2.12–5.46)
2021 610 102 16.72 (13.89–19.98) 1.15 (0.51–2.47)
2022 637 119 18.68 (15.77–21.98) 2.82 (1.73–4.51)
2023 602 92 15.28 (12.55–18.46) 0.36 (0.07–1.38)
2024 586 198 33.79 (29.99–37.80) 17.87 (14.90–21.27)
Figure 1. 

Raw and adjusted rates of AI-generated abstracts per year A) raw counts of AI-generated abstracts per year. The year 2017 was used to adjust for false positives B) adjusted rates of AI-generated abstracts in percent (%) per year. Whiskers indicate 95% confidence intervals.

Correlation with impact factor

For articles published before 2024, there was a positive correlation between the raw rate of AI-generated abstracts and the journal’s impact factor with a correlation coefficient of 0.42 (95% CI: -0.01 to +0.72, P-value = 0.06). There was, however, no significant correlation between the frequency of potentially AI-generated abstracts and the journal’s impact factor for articles published in 2024, with a correlation coefficient of -0.03 (95% CI: -0.45 to +0.40, p-value = 0.90).

Geographic distribution and diversity

Next, we examined the geographic diversity of the publications by assessing the affiliations of the first authors. For this analysis, we excluded articles where the affiliation of the first authors could not be determined, resulting in a total of 4532 abstracts for evaluation. The affiliations of the first authors spanned 85 different countries, with the landscape dominated by the US (n=871, 19.2%) and China (n=533, 11.7%). The remaining 83 countries contributed 3128 (69.0%) of the articles. The adjusted rate of AI-generated abstracts was 18.4% for first authors from the US, 22.2% for first authors from China, and 19.3% for those from the other 83 countries (p = 0.11).

Next, we examined the trend over time concerning the distribution of countries and diversity. In 2017, publications from the US accounted for 21.0% of all articles, while those from China represented just 6.3%. The remaining countries (n=41) contributed 72.7% of publications. By 2024, however, the share of publications from the US decreased to 17.3%, whereas contributions from China rose to 21.3%. Publications from the other countries (n=46) decreased to 61.4% (Fig. 2A). To quantify this diversity more accurately, we applied two metrics: the Shannon diversity index and Simpson’s index of diversity. As shown in Fig. 2B, the sharp increase in AI-generated abstracts within the dermatology literature in 2024 had no significant impact on geographic diversity. Both diversity indices decreased compared to the baseline year of 2017.

Figure 2. 

Diversity trends over time A) proportions of countries of first authors from 2017 to 2024 (US: United States of America) B) evolution of diversity indices over time relative to baseline.

Discussion

Our data suggest that 2024 marks a turning point in the use of AI-generated text within the dermatology literature. This year, we observed a sharp increase in the proportion of AI-generated abstracts in our field. Both the raw and adjusted counts showed significant rises compared to previous years. After accounting for false positives, we estimated that the probability of encountering AI-generated abstracts in the dermatology literature is around 18%, a significant increase from the rate of well below 5% observed before 2024. Several developments in 2023 likely contributed to this increase in the use of LLMs in academic writing. Notably, in 2023 there was a high density of new releases and updates regarding LLMs. In February 2023, Meta released LLaMA (Large Language Model Meta AI), OpenAI launched GPT-4, and in May, Google released PALM-2 (Pathways Language Model 2). Not only has the range and quality of LLMs improved, but the threshold for using these models among researchers may have decreased. This shift can be attributed to the increasing familiarity with LLMs and the decision by most academic journals to provide clear guidelines for their use rather than imposing outright bans.

Despite the increased usage of LLMs, there are only a limited number of studies quantifying their usage in medical scientific literature. Bisi et al. compared the rate of AI-generated text before and after the release of ChatGPT in November 2022 in a single orthopaedic journal. They found an increase in AI-generated text from 10.3% to 15.6%. However, they also noted that AI-generated text was primarily detected in abstracts. This finding led us to focus our analysis on abstracts only [13].

The potential advantages of LLMS in medical writing are evident. They may assist researchers by drafting sections of their papers and improving productivity, and they may enhance language quality, grammar, and style, especially for non-native English speakers. Therefore, LLMs can play a crucial role in diversifying authorship in scientific journals and may help to address global inequalities in scientific publishing. As scientific literature typically reflects the predominant research focus in the field, a lack of diversity among regions can skew our understanding of skin diseases and affect clinical care. This issue is particularly pronounced in the field of dermatology, where certain skin diseases that are more prevalent in underrepresented countries do not receive the attention they deserve. Consequently, research on these underrepresented skin diseases often faces lower levels of funding and interest. This lack of attention can delay or even prevent the development of new treatments and drugs, exacerbating health disparities. Our analysis suggests that the use of LLMs alone is not sufficient to enhance the diversity of scientific output in dermatology. Instead, it may even widen the gap between well-funded, technologically advanced countries and those with fewer resources. Instead of an increase in diversity, we found a shift in the landscape of publication dominance. Originally dominated by the US, there is now a bipartite dominance that includes China, which may have benefited from LLMs supporting non-English speaking authors. However, the share of publications from other countries has remained stable and the overall diversity did not increase but rather decline. Interestingly, we observed an increase in diversity in scientific publications during the COVID-19 pandemic, reflecting the global perspective and widespread impact of this disease. This trend highlights how a global health crisis can bring attention to diverse perspectives and regions that are typically underrepresented. However, our data suggest that the use of LLMs did not play a significant role during the COVID-19 pandemic.

It is important to note that LLMs may also introduce certain negative effects. Currently, AI systems primarily work with freely accessible databases, which excludes subscription journals. This limitation can lead to an overemphasis on publicly accessible abstracts and data, potentially skewing the representation of available research. Moreover, LLMs have limited ability to verify the accuracy of sources, which can result in the inclusion of incorrect or fabricated data. Therefore, it is crucial to ensure that LLMs do not misinterpret or misrepresent scientific data when utilizing these tools. Despite their potential to diversify scientific publishing and attract a broader range of researchers, LLMs might also negatively impact scientific writing. The concern is that scientific writing could become sterile and uniform, lacking the idiosyncratic features that make reading enjoyable.

Our study has several limitations. First, we used an AI-detector to identify AI-generated text, which may produce false positives. To mitigate this effect, we used the year 2017, a period prior to the widespread availability of LLMs, to set a baseline for false positives and adjusted the rate of detected AI-generated text accordingly. We did not use human reviewers because it has been shown that researchers themselves are often unable to distinguish between LLM-generated and human-written texts. Second, we analyzed abstracts from dermatology literature only, rather than scanning entire papers. The proportion of AI-generated text may be much lower in full articles compared to abstracts. Third, our analysis focused on abstracts from March of each year, which excluded some quarterly dermatologic journals. Finally, it is important to emphasize that the detection of AI-generated text does not disqualify the scientific content of an article or the originality of its authors. LLMs may have been used solely for text polishing and improvement of grammar and style, which, in our opinion, is entirely acceptable.

Additional information

Conflict of interest

HK received nonfinancial support from Derma Medical Systems, Fotofinder and Heine, and speaker fees from Fotofinder.

Ethical statements

The authors declared that no clinical trials were used in the present study.

The authors declared that no experiments on humans or human tissues were performed for the present study.

The authors declared that no informed consent was obtained from the humans, donors or donors’ representatives participating in the study.

The authors declared that no experiments on animals were performed for the present study.

The authors declared that no commercially available immortalised human and animal cell lines were used in the present study.

Funding

No funding was reported.

Author contributions

Anna K. Wolber: Writing- draft preparation, Discussion, Revision; Harald Kittler: Data curation, Conceptualization, Methodology, Supervision.

Author ORCIDs

Anna K. Wolber https://orcid.org/0009-0004-7011-7623

Harald Kittler https://orcid.org/0000-0002-0051-8016

Data availability

The data that support the findings of this study are available from the corresponding author.

References

  • 2. Giglio AD, da Costa MUP. The use of artificial intelligence to improve the scientific writing of non-native english speakers. Rev Assoc Med Bras. 2023;69(9):e20230560. https://doi.org/10.1590/1806-9282.20230560
  • 3. Almarie B, Teixeira PEP, Pacheco-Barrios K, Rossetti CA, Fregni F. Editorial – The Use of Large Language Models in Science: Opportunities and Challenges. Princ Pract Clin Res. 2023;9(1):1–4. https://doi.org/10.21801/ppcrj.2023.91.1
  • 13. Bisi T, Risser A, Clavert P, Migaud H, Dartus J. What is the rate of text generated by artificial intelligence over a year of publication in Orthopedics & Traumatology: Surgery & Research? Analysis of 425 articles before versus after the launch of ChatGPT in November 2022. Orthop Traumatol Surg Res. 2023;109(8):103694. https://doi.org/10.1016/j.otsr.2023.103694
login to comment