AI accurately identifies questionable open-access journals by analysing websites and content, matching expert human assessment

Artificial intelligence (AI) could be a useful tool to find ‘questionable’ open-access journals, by analysing features such as website design and content, new research has found.

The researchers set out to evaluate the extent to which AI techniques could replicate the expertise of human reviewers in identifying questionable journals and determining key predictive factors. ‘Questionable’ journals were defined as journals violating the best practices outlined in the Directory of Open Access Journals (DOAJ) – an index of open access journals managed by the DOAF foundation based in Denmark – and showing indicators of low editorial standards. Legitimate journals were those that followed DOAJ best practice standards and classed as ‘whitelisted’.

The AI model was designed to transform journal websites into machine-readable information, according to DOAJ criteria, such as editorial board expertise and publication ethics. To train the questionable journal classifier, they compiled a list of around 12,800 whitelisted journals and 2500 unwhitelisted, and then extracted three kinds of features to help distinguish them from each other: website content, website design and bibliometrics-based classifiers.

The model was then used to predict questionable journals from a list of just over 15,000 open-access journals housed by the open database, Unpaywall. Overall, it flagged 1437 suspect journals of which about 1092 were expected to be genuinely questionable. The researchers said these journals had hundreds of thousands of articles, millions of citations, acknowledged funding from major agencies and attracted authors from developing countries.

There were around 345 false positives among those identified, which the researchers said shared a few patterns, for example they had sites that were unreachable or had been formally discontinued, or referred to a book series or conference with titles similar to that of a journal. They also said there was likely around 1780 problematic journals that had remained undetected.

Overall, they concluded that AI could accurately discern questionable journals with high agreement with expert human assessments, although they pointed out that existing AI models would need to be continuously updated to track evolving trends.

‘Future work should explore ways to incorporate real-time web crawling and community feedback into AI-driven screening tools to create a dynamic and adaptable system for monitoring research integrity,’ they said.

 

 

Continue Reading