Facial recognition technology has been deployed publicly on the basis of benchmark tests that reflect performance in laboratory settings, but some academics are saying that real-world performance doesn’t match up.
In a post to the Tech Policy Press website, University of Oxford academics Teo Canmetin, Juliette Zaccour, and Luc Rocher point to public failures of facial recognition as evidence that these systems don’t perform as well as evaluation statistics suggest.
The academics say the US National Institute of Standards and Technology’s (NIST) Facial Recognition Technology Evaluation (FRTE), a performance benchmark, has been used to justify the deployment of AI systems including technology deployed by the UK’s Metropolitan Police Service, the subject of ongoing controversy.
But they insist NIST’s benchmark has problems.
First, they claim the evaluation fails to reflect real world conditions, where images may be blurred or obscured. Second, they say, the data sets used are too small, which allows for a greater chance of misidentification. Third, they contend, benchmark datasets fail to reflect real world demographics.
These problems lead to real-world failures, including the wrongful arrest of a Detroit man in 2020 based on flawed facial recognition and errant identification of a London-based knife crime-prevention activist by a live facial recognition technology that managed to identify only eight in 42 faces accurately, according to a University of Essex study.
“For the latest and best-performing models, standardized evaluations now report figures as high as 99.95 percent accuracy,” the authors wrote. “Out of context, these numbers suggest that facial recognition has progressed to be extremely accurate. But there’s a problem: these near-perfect numbers fail to reflect reality. Facial recognition appears to be significantly less accurate in real-world settings.”
A May 2025 research paper from criminologists and computer scientists at the University of Pennsylvania, “Accuracy and Fairness of Facial Recognition Technology in Low-Quality Police Images: An Experiment With Synthetic Faces,” supports the Oxford academics’ claims about how performance suffers when image quality isn’t pristine.
“Our experiment finds that facial recognition technology (FRT) performance degrades under poor image conditions, particularly with blur, pose variation, and reduced resolution, and that this degradation is not evenly distributed across demographic groups,” the UPenn researchers found. “False positive and false negative rates increase with image degradation, disproportionately affecting individuals from marginalized race and gender groups.”
At the same time, the UPenn scientists note that the accuracy of facial recognition is “substantially higher than that of many traditional forensic methods,” specifically fingerprinting and forensic firearm comparison.
Algorithmic accuracy, however, is only one aspect of the concerns about FRT. A 2023 US Government Accountability Office report on the application of FRT in law enforcement found US agencies have been using the technology without adequate training and without civil rights policies – a concern that admittedly seems quaint in light of current US arrest practices.
The consequences of this lack of agency rules can be seen in the Algorithmic Justice League’s recent “Comply To Fly?” report, which found the US Transportation Security Administration (TSA) has been using FRT without the informed consent of travelers. The report says that travelers have not been adequately informed that they can opt out of FRT scans and that two-thirds of travelers report hostile treatment by TSA officers when they attempt to.
NIST did not immediately respond to a request for comment. The US government standards body on Monday published guidelines [PDF] on detecting face morphing, the process of digitally combining multiple faces to create a fictitious face composite, potentially as a way to deceive FRT-based authentication systems.
A February 2024 report for the Innocence Project by Alyxaundria Sanford echoes the concerns raised by the Oxford boffins about the potential for bias. It noted, “There are at least seven confirmed cases of misidentification due to the use of facial recognition technology, six of which involve Black people who have been wrongfully accused: Nijeer Parks, Porcha Woodruff, Michael Oliver, Randall Reid, Alonzo Sawyer, and Robert Williams.”
The Electronic Frontier Foundation added two more names to that list earlier this year, Christopher Galtin and Jason Vernau, who were wrongly arrested in St. Louis and Miami, respectively, following flawed FRT identification. The advocacy group argues “face recognition, whether it is fully accurate or not, is too dangerous for police use, and such use ought to be banned.” ®