Accuracy

In nearly half of wrongful convictions where someone was exonerated with postconviction DNA testing, the misuse of forensic evidence was a factor. Yet labs and courts continue to use new and untested forensic methods with very little regulation.

In the first-of-its-kind report in 2009, the National Academy of Sciences, the nation’s premier scientific organization, concluded that a wide range of “forensic science disciplines are supported by little rigorous systematic research to validate the discipline’s basic premises and techniques.” From the use of wholly unvalidated bite mark evidence, to exaggerated lab analyst testimony in court, to the use of untested field drug kits, we need reforms to prevent unreliable forensics from being used.

Read more about various examples of this and then return to the Toolkit to learn more about how you can help enable accuracy in forensic evidence.

FEATURE

Accuracy: The Bite Mark Case

In 1982, a murder trial in Newport News, Virginia dubbed “the bite mark case,” turned into a media sensation, as the community heard dentists describe how they compared bite marks on the victim’s legs to molds of the defendant’s teeth.

Examples of the Fallibility of Forensic Analysis or Forensic Error

Example 1: Crime Scene Drug Testing

In Houston, hundreds of wrongful convictions resulted from botched drug tests by police, and almost all of those innocent people pleaded guilty. In the 1960s, police began to commonly test drugs using inexpensive and simple kits in the field. They put a small amount of the substance in a baggie, with pre-packaged chemicals designed to react and change color, depending on the substance. These $2 tests report whether evidence is a controlled substance or not. However, these commercial kits can be untested and of unknown reliability. Studies have found these kits can have shockingly high error rates. The field tests are supposed to be followed up by a more rigorous lab test. In the meantime, a person may be arrested for drug possession, and face great pressure to plead guilty, particularly if they are poor, denied bail, and remain in jail waiting for a day in court.

In Harris County, Texas, an audit by the prosecutor’s Conviction Integrity Unit uncovered that 456 cases involving field drug tests were erroneous. In 298 of the cases, there were no controlled substances, and in the other cases it was the wrong drug or wrong weight. The convictions in those cases were all reversed. The Texas Forensic Science Commission, in 2016, said that these field tests are too unreliable to use in criminal cases, and that there should also be a follow-up lab test. In 2017, Houston police banned the use of those field drug tests.

Example 2: Fingerprints

For decades, forensic analysts of different types testified they were 100 percent certain. As federal judge Harry T. Edwards put it, “The courts had been misled for a long time because we had been told, my colleagues and I, by some experts from the FBI that fingerprint comparisons involved essentially a zero error rate, without our ever understanding that’s completely inaccurate.”

Yet, no one had carefully tested the basic assumptions that fingerprint experts have relied upon for decades. First, are each person’s fingerprints unique? You have probably long assumed that fingerprints are unique and that no two are alike. About 95 percent of people believe fingerprints are unique. People think fingerprints are like snowflakes. Fingerprint examiners similarly assumed that all fingerprint patterns are completely different from each other, and not just that they are somewhat or mostly different from each other. Experts made the same strong assumption about bite marks, fibers, toolmarks, shoeprints, and a range of other types of forensics. We do not know if that strong assumption is true for fingerprints; it has never been tested.

Second, how often can one person’s fingerprint look like another person’s crime scene latent print? We do not know how often a smeared, partial latent fingerprint from a crime scene might look very much like some- one else’s print. It may depend on what level of detail one has in a print. We now know that errors can happen. Third, how good are experts at making fingerprint comparisons? We need to know the error rates; after all, we are trusting experts to make decisions that can send people to prison or even death row. The U.S. Department of Justice standards explain that a fingerprint identification is “a statement of an examiner’s belief.” The National Academy of Sciences report emphasized fingerprint examiners rely on “a subjective assessment” that lacks adequate “statistical models.” We do not know how common or rare it is to have particular features in a fingerprint.

The President’s Council of Advisors on Science and Technology (PCAST) report from fall emphasized that experts must tell jurors about the error rates. What is a valid error-rate study? For a more objective method, like a drug test, you can test each step in the process by seeing whether it produces accurate results. However, for subjective techniques like fingerprinting, there are not clearly defined and objective steps. The person is the process: an examiner whose mind is a “black box” that reaches judgments based on experience. To test a “black box” examiner you can give such people evidence where the correct answer is known in advance. Ideally, the participants should not know that they are being tested. The samples, whether fingerprint, bite mark, or firearm evidence, should be of realistic difficulty.

The PCAST report described how researchers had conducted two properly designed studies of the accuracy of latent fingerprint analysis. That alone is deeply disturbing. It was generous for the report to say that just two studies were enough to permit a technique to be used in court. While neither study is perfect, both found nontrivial error rates. One of the two studies was a larger-scale study supported by the FBI. The second was a smaller study by the Miami-Dade police department. The false positive rates could be as high as 1 in 306 in the FBI study and 1 in 18 in the Miami-Dade study. To be sure, the people participating in the FBI study knew that they were being tested. They knew that it was an important study for the field. They were likely very cautious in their work. That FBI study also reported that a massive 85 percent of the 169 examiners made at least one false negative error. If false negatives are a much greater problem in real labs, as they are in studies, it could mean that untold thousands of guilty culprits are not identified in real cases.

Some of the errors that analysts made in these studies may have been clerical errors. Yet in the Miami study, for example, if one leaves out possible clerical errors, the error rate could still be as high as 1 in 73. Perhaps clerical errors should be included, though; they can have grave real-world consequences. We do not know whether the prints used in these studies were realistic or sufficiently challenging, either. We know that other fingerprint examiners may perform differently, based on their training and skill.

These findings still provide a wake-up call. It would shock jurors to hear of either a 1 in 18 or a 1 in 306 error rate. When a public defender in Joplin, Missouri, asked prospective jurors in a 2018 case about fingerprint evidence, they said things like, “I believe fingerprints are 100 percent accurate,” and “fingerprints are everything when it comes to a crime scene,” and “I mean, it’s an identifier . . . We’ve been taught all our lives that fingerprint is what identifies us, and that it is unique.”

Example 3: Bite Marks

The National Academy of Sciences, in its important 2009 report, concluded that there needs to be more research “to confirm the fundamental basis for the science of bite mark comparison.” They said that it has “not been scientifically established” that human dentition is unique. The scientists who wrote the PCAST report concluded that since no valid studies of error rates have been done, the techniques were simply not valid. What we do know about reliability is disturbing. The American Board of Forensic Odontology, the professional association of forensic dentists, conducted a study to test its members. In the late 1990s, they gave dentists bite mark evidence of medium to good quality. The dentists were asked to compare four bite marks to seven sets of teeth, four of which made each of the marks. This is called a “closed set” study, since there was a correct answer for each of the four marks. In a real case, one does not know if a suspect’s teeth produced any of the evidence. Of the sixty dentists who asked to take the study, only twenty-six filled it out, and those dentists were wrong in nearly half of their responses. Other studies found high error rates as well. None of these troubling findings blunted the testimony dentists delivered in court, nor did dentists make a habit of describing these studies in their reports or testimony.

Example 4: Firearms Evidence

Of all the pattern-comparison techniques used, firearms comparisons are perhaps the most common, possibly even more so than fingerprint com- parisons. Firearms violence is a major problem in the United States, with over 10,000 homicides involving firearms and almost 500,000 other crimes committed using firearms. Firearms comparisons are in great demand. Examiners seek to link crime scene evidence, such as spent shell casings or bullets, with a firearm. The assumption is that manufacturing processes used to cut, drill, and grind a gun leave markings on the barrel, breech face, and firing pin. When the firearm discharges, those components contact the ammunition and leave marks on it. Experts assume different firearms should leave different toolmarks on the ammunition. They believe toolmarks allow them to definitively link spent ammunition to a firearm.

For over a hundred years, firearms experts have testified in criminal trials. Firearms experts traditionally testified in court by making “uniqueness” claims much like those made about fingerprints. Experts said that “no two firearms should produce the same microscopic features on bullets and cartridge cases such that they could be falsely identified as having been fired from the same firearm.” By the late 1990s, experts premised testimony on a “theory of identification” set out by a professional association, the Association of Firearms and Tool Mark Examiners (AFTE). AFTE instructs practitioners to use the phrase “source identification” to explain what they mean when they identify “sufficient agreement” when examining firearms. At a general level, these firearms examiners examine markings that a firearm leaves on a discharged bullet of cartridge casing. The AFTE’s so-called theory is circular. An identification occurs when the expert finds sufficient evidence defined as enough evidence to find an identification.

In recent years, scientists have called into question the validity and reliability of such testimony. In a 2008 report on ballistic imaging, the National Academy of Sciences concluded that definitive associations like “source identification” were not supported. In its 2009 report, the NAS followed up and stated that categorical conclusions regarding firearms or toolmarks were not supported by research, and that, instead, more cautious claims should be made. The report stated that the “scientific knowledge base for tool mark and firearms analysis is fairly limited.” The AFTE theory of identification “is inadequate and does not explain how an expert can reach a given level of confidence in a conclusion.” Judges have also raised concerns that this theory represents “unconstrained subjectivity masquerading as objectivity,” it is “inherently vague” and “subjective,” or “either tautological or wholly subjective.”

By 2016, only a single black box study had been done, showing an error rate that could be as high as 1 in 46. This single study had not been published. The authors of the PCAST report concluded firearms comparisons, very commonly used in criminal cases, fall short and are not valid. The rate of inconclusive errors in that study was almost 35%. An “inconclusive” answer was an error; that study had correct “yes” or “no” answers on every item. A follow-up study had even more inconclusive errors – over half of all responses. Further, large numbers of examiners dropped out of the study, making the entire still-unpublished effort highly problematic.

Yet, to this day, firearms examiners use terms like “source identification” in court—although some judges have begun to step in and require more cautious wording. The Department of Justice announced guidelines in 2019: experts should use the term “source identification,” which they define as “an examiner’s conclusion that two toolmarks originated from the same source.” The guidelines sound much like the AFTE theory: examiners may call it an identification when they decide that it is one. Until serious research is done to address concerns about a subjective process, no documentation of the work, and evidence of very high error rates, this technique should not be used to definitively link evidence in court.

Some reforms your jurisdiction might consider:

Police should conduct evidence collection carefully and following scientific standards, so that potentially valuable evidence is not lost or contaminated.
People should have the right to full and complete information about the forensic evidence used in their cases.
People should have the right to defense experts, funded for indigent persons, to contest forensic evidence used against them.
Unreliable and unvalidated forensic methods should not be used in a community.
Only scientific and approved methods should be used by police or crime labs.