Legal Affairs

Falling on Deaf Ears

Scientists say that earwitnesses are unreliable. Why aren’t the courts listening?

By Lawrence M. Solan and Peter M. Tiersma
BRUNO RICHARD HAUPTMANN WAS EXECUTED in Trenton, N.J., in April 1936, for kidnapping and murdering the young son of the famous aviator Charles Lindbergh. The most dramatic moment in Hauptmann’s closely watched trial came when Lindbergh identified Hauptmann’s voice as that of his son’s kidnapper. “The minute Lindbergh pointed his finger at Hauptmann, the trial was over,” said Hauptmann’s lawyer after the conviction. “Jesus Christ himself said he was convinced this was the man who killed his son. Who was anybody to doubt him or deny him justice?”

Lindbergh had heard the voice of his son’s kidnapper three years earlier. Still hoping to get the child back alive, Lindbergh had accompanied Dr. John Condon to St. Raymond’s Cemetery in the Bronx to deliver ransom money. Condon handed off $50,000 in marked gold certificates, while Lindbergh waited nearly 100 yards away in a car. Out of the darkness came the words, “Hey, doctor! Over here, over here.”

Twenty-nine months after the encounter in the cemetery, in September 1934, Lindbergh told a Bronx grand jury that “it would be very difficult to sit here and say that I could pick a man by that voice.” Undeterred, the district attorney asked Lindbergh later that day: “Would you like to see the man who kidnapped your son?” The next morning, while Lindbergh sat in the back of the D.A.’s office among a group of detectives, Hauptmann was brought in and asked to repeat the words, “Hey, doctor. Here, doctor, over here.” Lindbergh told the prosecutor that he recognized the voice as that of the kidnapper, and he testified under oath at the trial that Hauptmann was the man he had heard in the cemetery.

The question of how well lay witnesses like Lindbergh can recognize voices arises regularly in legal cases. When a woman is sexually assaulted by a man wearing a ski mask, or when a government official receives a bomb threat, the case may hinge on how well the victim can identify the perpetrator’s voice. But at the time of Hauptmann’s trial, no experts were available to assess the accuracy of Lindbergh’s account. In fact, social science research on how well people can identify speakers by their voices was actually initiated by Hauptmann’s trial, though too late to help him. In the intervening decades, researchers have made big leaps in understanding the human capacity to identify voices, but the legal system has yet to take the research into account. As a result, intuition based on personal experience rather than science tends to govern the admissibility and perceived reliability of voice identifications at trial. And scientists have shown that our intuitions are often wrong.

ONE YEAR AFTER HAUPTMANN’S EXECUTION, Frances McGehee, a psychology professor at the University of Illinois, had students listen to a person read a 56-word passage from behind a screen. The students were then tested at various times to see whether they could pick the reader out from a group of five voices. They did so with 83 percent accuracy the next day. Three weeks later, however, their success rate had declined dramatically to 51 percent. Five months later they were down to a dismal 13 percent accuracy rate—well below chance.

More recent work confirms McGehee’s findings that accurate identification persists for a period of time, and then deteriorates sharply, far more so than most people would expect. Not surprisingly, the amount of time we are exposed to a voice matters, but the number of times a listener is exposed to a voice may be more important than the length of the exposure. Hearing a voice once for 60 seconds is not nearly as helpful as hearing it three times for 20 seconds each time.

We all know that less familiar voices are harder to recognize. But the degree of familiarity matters more than we might assume. The Canadian psychologist Daniel Yarmey and his colleagues have found that people can identify the voices of those close to them, such as family members, with 89 percent accuracy in a voice lineup, but that accuracy drops to 66 percent when the voice is that of an acquaintance, such as a neighbor or coworker with whom the subject has had only occasional contact.

Disguise is also a problem. A simple and effective form of disguising a voice is whispering. Distorting the voice is another method. A device as low-tech as a pencil can be quite effective in masking a voice. Brazilian kidnappers have been reported to place a pencil between their teeth when making ransom demands. This trick creates complex acoustic changes by affecting the movement of the speaker’s tongue and jaw, making the voice that much more difficult to identify.

Some voices, especially those of family members, may be very similar to each other and easy to confuse. We have all had the experience of calling someone on the phone and misidentifying the person who answers, even when we know well both the person to whom we are speaking and the one to whom we think we are speaking. We identify preadolescent boys as their mothers and confuse the voices of brothers with each other. It should not be surprising that skilled imitators can intentionally cause confusion. In a study conducted in Sweden, people were asked whether they heard the voice of Carl Bildt, the former Swedish Prime Minister, among a group of voices played for them. The actual voice on the tape was that of a good political impersonator. People just about always got it wrong, unless they also heard Bildt’s actual voice as one of the alternatives in the lineup. The possibility of misidentification in a court setting is clear.

On the whole, research has shown that we are not as good at voice identification as we think we are, but scientists have also discovered that all listeners are not created equal. People differ dramatically in their ability to identify voices. Some people are great at it, and others awful. Not much is known, however, about why some are better at voice recognition than others. The skill appears to correlate to some extent with musical ability and perhaps certain aspects of memory, but no surefire way to predict this ability has yet been developed. Were reliable aptitude tests available, proving that an iden tifying witness had limited ability to recognize voices might be equivalent to showing that an eyewitness had uncorrected bad vision. For now, judges seem to presume that everyone is relatively good at voice recognition, better, in fact, than the research suggests is possible.

THE LEGAL SYSTEM DOES NOT RECOGNIZE MOST OF THE FINDINGS—especially the counterintuitive ones—that scientists have demonstrated since the Hauptmann trial. An earwitness who testified, “Of course I recognize [the defendant’s] voice—I’ve lived next door to him for five years” would likely devastate the defendant’s case. In fact, there is a strong probability that the witness—despite her familiarity with the voice—would get it wrong.

The matter is more than academic. In 1992, Guy Paul Morin was convicted of raping and murdering a young girl from his neighborhood in Ontario. The conviction was based in part on an erroneous identification of Morin’s voice by the child’s mother. On the night of the crime, a number of people had heard a man’s voice cry out from outside the victim’s home, “Help me, help me. Oh God, help me,” as if the perpetrator had done something terrible and was consumed with remorse.

No one identified the voice to the police at that time according to police records. But after Morin was arrested, the victim’s mother identified his voice. She said that she knew it was his because she had spoken with him a few times over the backyard fence. The court allowed the identification because the witness was familiar with the voice in advance of the case. Morin served 18 months of a 25-year sentence before he was exonerated by DNA evidence in 1995.

In the United States, the legal system has taken some steps to reduce the likelihood of false identifications by earwitnesses. Under the Fifth Amendment’s due process clause, which deals with criminal matters, the Supreme Court has established procedures that courts must follow before admitting identification evidence. In fact, the same 1972 case that set ground rules for eyewitness identification also involved an earwitness identification. According to these rules, it is no longer permissible for police officers to invite the victim of a crime to the police station, bring her within earshot of a defendant who is asked to say a few words, and then ask the victim if the defendant was the perpetrator of the crime, as happened in the Lindbergh case.

Such suggestive procedures are considered too likely to result in false identifications. But even in the case of a suggestive identification, the Supreme Court’s analysis allows courts to admit the identification, as long as it is determined to be reliable. Reliability is an empirical question, but one that the courts have continually shown no ability to answer accurately. Though experiments have shown no correlation between a witness’s confidence in what he heard and the accuracy of his identification, courts generally hold that a witness’s confidence in the identification is a good indicator of its reliability.

If a judge decides that an identification is reliable enough to be admitted into evidence, it’s up to a jury to decide whether the identification is correct. Jurors, unfortunately, tend to have the same misconceptions about voice identification that judges have, and the typical jury instructions on how to evaluate the evidence are unlikely to enlighten them. While a sharp defense lawyer might be able to call an expert who could educate the jury, most lawyers do not have the money to do so or do not know that such experts exist.

THE LEGAL SYSTEM’S RELUCTANCE to look seriously at questions of speaker identification stems partly from the recognition of “voiceprint” experts in some courts as expert witnesses in the 1960s and ’70s. Voiceprints, known as sound spectrograms in scientific circles, are graphic representations of the amplitude and frequency of sound. The technology was developed in the 1940s to create “visible speech” that deaf people might be taught to decipher. It was used by the military during World War II to try to identify speakers of intercepted radio messages. Neither effort was particularly successful.

In 1962, however, one of its developers, Lawrence Kersta, published an article in Nature that claimed that people’s voices, like their fingerprints, are unique and can be identified through visual inspection of their voiceprints. During the ’70s, published studies touted voiceprints as a highly reliable means of identifying voices, and many law enforcement agencies welcomed the technology. A typical case might involve a telephoned bomb threat that was recorded on audiotape. After the police arrested a suspect, they would have him recite the same words into a tape recorder. The examiner would then run both tapes through spectrographic analysis, compare the voiceprints, and reach a conclusion.

Prominent experts in phonetics had their doubts about the reliability of this methodology. Some courts began permitting voiceprint experts to testify; others rejected this “expertise.” In 1979, an influential report from the National Research Council slowed the acceptance of voiceprint specialists as experts. The report determined that voiceprint analysis, while accurate under ideal laboratory conditions, was not reliable enough for courts to depend on the technology when a recording was made under “real-world” conditions, where voice signals are degraded by problems like poor recording quality, background noise, and telephone transmission.

Occasional battles over voiceprints have continued to surface during the past 20 years, but most law enforcement agencies have stopped trying to get them into court. In the 1990s, the Supreme Court tightened the standards for admitting scientific evidence in federal court, further reducing the motivation to use the technology. The voiceprint’s demise as a valuable forensic tool has resulted in a broader decline in the interest in voice identification techniques generally. To many judges and lawyers involved in the criminal justice system, including leading experts on scientific evidence, voice identification has been equated with voiceprints and voiceprints are too unreliable.

Other so-called forensic identification sciences, including microscopic hair analysis, handwriting identification, bite-mark analysis, ballistics, and even fingerprints have also been under attack in recent years. Once the Supreme Court, in its 1993 Daubert decision, established the “known rate of error” as one of the indicia of scientific reliability, it did not take long for lawyers and legal scholars to notice that many of these identification “sciences” have been used in courts for years with little proof about their error rate. Some, like hair analysis, have become notorious for contributing to convictions that were later overturned by DNA evidence, the Central Park Jogger case being perhaps the most notorious example.

Unlike most of the forensic identification sciences, which have been defending their turf against these challenges, experts in phonetics have been at the forefront of those questioning the reliability of traditional voiceprint analysis. But perhaps because the field of phonetics has applications outside the courtroom—voice recognition software is used by word processors and corporate security agencies alike—they have not given up the pursuit of accurate voice recognition. Rather, phonetics experts are working to develop more reliable methodologies for identifying voices. Steady advances in computer technology may have a great impact on forensic voice identification in the future. Huge databases of voices, sophisticated mathematical modeling techniques, and the ability of acoustic engineers to decompose the human voice into a host of different components have led to enormous improvement in voice recognition technology.

Still, even with all of these improvements, machines that can identify a voice with the reliability of DNA or fingerprints are still in the future. The most advanced technology is not yet able to deal well with disguise, and performance dwindles when the voice recordings are of poor quality. But the steady improvements in the field suggest that the technologies may become accurate enough to be relied upon. We hope that the history of voiceprint analysis will not preclude courts from taking better voice recognition technologies seriously as they become available for courtroom use.

Of course, no technology, however perfect, will always be able to compensate for the weaknesses of human memory. Though voice recognition software may one day be able to determine whether the voice on a tape matches that of a suspect—or whether the voice on a broadcast is really Saddam Hussein’s—it will never make the testimony of the woman who heard her daughter’s attacker and thought she recognized her neighbor more reliable. The courts have been more inclined to trust people than machines, but we may soon reach a time when the reverse should be true.

Eyewitness Identification
Junk Science

Truth in Justice