Imagine going to the doctor, telling them exactly how you feel, only to have subsequent transcription add incorrect information and change your story. This may be the case for medical centers that use Whisper, an OpenAI transcription tool. ABC News reports that more than a dozen developers, software engineers, and academic researchers believe that Whisper creates hallucinations, including fake drugs, racial and violent statements. He said he found evidence that. But last month, the latest version of Whisper was downloaded 4.2 million times on open source AI platform HuggingFace. This tool is also included in Oracle and Microsoft cloud computing platforms, along with some versions of ChatGPT.
The evidence of harm is very extensive, and experts have found serious shortcomings in Whisper across the board. For example, researchers at the University of Michigan found fabricated passages in 8 out of 10 audio transcriptions of public meetings. In another study, computer scientists analyzed more than 13,000 audio recordings and found 187 hallucinations. This trend continues. One machine learning engineer found hallucinations in about half of the 100+ hours worth of transcriptions, but the developers found hallucinations in nearly all of the 26,000 transcriptions they had Whisper create.
Looking at specific examples of these hallucinations further reveals the potential danger. Professors Alison Konecke and Mona Sloan from Cornell University and the University of Virginia examined clips from a research repository called TalkBank. They found that about 40% of hallucinations can be misinterpreted or misrepresented. In one case, Whisper fabricated that the three people being discussed were black. In another story, Whisper changed it to “He, the boy, I don’t know exactly, but he was going to take an umbrella.” “He got a big piece of the cross, a small little piece…He wouldn’t have had a terrorist knife. So he killed a number of people.”
Whisper’s hallucinations also have dangerous medical implications. The company Nabla uses Whisper as a medical transcription tool, used by more than 30,000 clinicians and 40 health systems, and has transcribed an estimated 7 million visits to date. The company claims to be aware of the issue and is addressing it, but there is currently no way to confirm the validity of the records. According to Nabla’s chief technology officer Martin Raison, the tool erases all audio for “data security reasons.” The company also claims that providers will need to quickly edit and approve transcripts (using all of a doctor’s free time?), but that this system is subject to change. On the other hand, due to privacy laws, no one else can verify that the transcription is accurate.
