Towards Fair and Inclusive Speech Recognition for Stuttering

Current speech recognition tools perform poorly for people who stutter. A primary cause is the lack of representative, diverse stuttered speech data when developing ASR models. To address the gap, we co-created the first stuttered speech dataset in Mandarin Chinese with a grassroots community of Chinese-speaking people who stutter. Read our paper here.

Around 1 in every 100 people stutter.

They often experience stigma and discrimination in social, romantic, educational, and professional settings.

70 Million
People stutter worldwide, which is around 1% of the global population.
3 Million
People in the U.S. stutter, which is around 0.9% of the U.S. population.
Automatic Speech Recognition (ASR) systems perform poorly for people who stutter (PWS).

ASR technologies are prevalent in our communication ecosystem. Speech interaction is particularly important for devices that have small or no screens.

Mobile Phones
Smart Home Assistants
Smartwatches
In-car Navigation
As stuttering severity increases, ASR error rates for consumer systems like conversational telephone speech agents also increase.
A primary cause of poor performance is the lack of accessible, representative, and authentic stuttered speech data when developing ASR models.

AImpower.org partnered with StammerTalk (口吃说), an online community of Chinese-speaking people who stutter, in a community-led effort. This is the first and largest corpus of stuttered speech in Mandarin Chinese. 

49
Hours of speech recorded
429K
Characters in verbatim transcriptions
72
Contributors (all PWS)
38K
Annotated stuttering events
The StammerTalk dataset captures a wide spectrum of stuttering frequency and patterns across 72 PWS in scenarios, providing a much more authentic and comprehensive representation of stuttered speech for ASR models.

Voice Command Dictation

&

Unscripted Conversation

->

Data resembling real-world speech product use cases

We audited two open-source ASR models with the StammerTalk dataset to benchmark performance on Chinese stuttered speech.

We tested two types of transcriptions
1. Semantic: excludes word repetitions and interjections
2. Literal: stuttered utterances kept verbatim

->

ASR models found on

->

Character Error Rate (CER)

Substitution (SUB)

Insertion (INS)

Deletion (DEL)

Error rates we measured

For both models, there are more errors as stuttering severity increases.

The Whisper model tends to “smooth” transcriptions by deleting words with low semantic value.

The wav2vec 2.0 model performs 1.5-2x worse than Whisper, with more substitution mistakes. 

Further analysis showed that CERs are higher for voice command dictations compared to unscripted conversation.

The average dictation CER% is over 2x higher for severe stuttering. Voice command dictations are often used in speech interfaces and ASR-mediated interactions, so higher error rates could lead to accessibility barriers and psychological harms for PWS.

Adequate and authentic representation of the disability communities in AI data remains a challenge.

Despite the unprecedented scale of the StammerTalk dataset, stuttered speech remains immensely underrepresented in ASR. The models we tested performed worse for PWS, highlighting a major shortcoming in these systems. To close these performance gaps in ASR technologies, we need to create datasets that reflect diverse speech patterns such as stuttering.

Mild StutteringModerate StutteringSevere Stuttering
Conversation17.7%20.7%31.0%
Dictation25.7%32.8%56.6%

Interested?

Partner with us to ensure that speech recognition technology is inclusive for all.
Researchers & Scholars

We’re eager to exchange ideas with and learn from people who are studying fair AI data practices. Learn more about our data [here]

Developers

Interested at building inclusive speech AI models for your applications? Request access to our Dataset

Speech-Language Pathologists

If you are a SLP and are interested in using this data for educational, research, or clinical purposes, we want to hear about your use case.

Contact Us!

Check out our other recent work at Blog page. If you’re interested in joining us, please reach out at partnership@aimpower.org