LLMs as all-in-one tools to easily generate publication-ready citation diversity reports

It has been recognized that women and minority scientific authors are cited less frequently than male and majority authors, a systemic phenomenon that negatively affects the visibility and career success of female and minority scientists and biases the prominent research questions of a field1,2. A citation diversity report (CDR), an optional section immediately preceding the reference section of a manuscript, addresses this by quantifying the demographic distribution of cited authors in manuscripts, enabling analysis and potential revision of the proportion of cited scholars from historically excluded groups as a means to advance diversity and inclusivity in science1,2. Journals can also benefit from authors including CDRs in their papers so that they may track the overall citation diversity of their journal.

To provide an accurate basis for analysing citation diversity, academic databases such as ORCID have begun to ask authors to voluntarily self-report their gender, race and ethnicity; however, not all scholars choose to disclose this information. Because such data are not widely available at this time, name-based prediction of demographics such as gender, race and ethnicity has become common practice for CDR preparation3. For example, current CDR analysis tools such as cleanBib (https://github.com/dalejn/cleanBib) automate name-based gender and race/ethnicity prediction. However, the databases that cleanBib queries, Gender API and Ethnicolr, have imperfect accuracies of 96.1% and 83%, respectively (ref. 4; https://ethnicolr.readthedocs.io/ethnicolr.html#evaluation).

Continue Reading