Community-led Stuttered Data Collection & Stewardship

Towards Fair and Inclusive Speech Recognition for Stuttering

Current speech recognition tools perform poorly for people who stutter. A primary cause is the lack of representative, diverse stuttered speech data when developing ASR models. To address the gap, we co-created the first stuttered speech dataset in Mandarin Chinese with a grassroots community of Chinese-speaking people who stutter. Read our paper here.

Around 1 in every 100 people stutter.

They often experience stigma and discrimination in social, romantic, educational, and professional settings.

70 Million

People stutter worldwide, which is around 1% of the global population.

3 Million

People in the U.S. stutter, which is around 0.9% of the U.S. population.

Automatic Speech Recognition (ASR) systems perform poorly for people who stutter (PWS).

ASR technologies are prevalent in our communication ecosystem. Speech interaction is particularly important for devices that have small or no screens.

As stuttering severity increases, ASR error rates for consumer systems like conversational telephone speech agents also increase.

Source: Colin Lea et al. From User Perceptions to Technical lmprovement: Enabling People Who Stutter
to Better Use Speech Recognition. CHl’23

A primary cause of poor performance is the lack of accessible, representative, and authentic stuttered speech data when developing ASR models.

The StammerTalk dataset addresses the gap in stuttered speech data.

AImpower.org partnered with StammerTalk (口吃说), an online community of Chinese-speaking people who stutter, in a community-led effort. This is the first and largest corpus of stuttered speech in Mandarin Chinese.

Hours of speech recorded

429K

Characters in verbatim transcriptions

Contributors (all PWS)

38K

Annotated stuttering events

The StammerTalk dataset captures a wide spectrum of stuttering frequency and patterns across 72 PWS in scenarios, providing a much more authentic and comprehensive representation of stuttered speech for ASR models.

Voice Command Dictation

Unscripted Conversation

Data resembling real-world speech product use cases

We audited two open-source ASR models with the StammerTalk dataset to benchmark performance on Chinese stuttered speech.

We tested two types of transcriptions
1. Semantic: excludes word repetitions and interjections
2. Literal: stuttered utterances kept verbatim

ASR models found on

Character Error Rate (CER)

Substitution (SUB)

Insertion (INS)

Deletion (DEL)

Error rates we measured

For both models, there are more errors as stuttering severity increases.

The Whisper model tends to “smooth” transcriptions by deleting words with low semantic value.

The wav2vec 2.0 model performs 1.5-2x worse than Whisper, with more substitution mistakes.

Further analysis showed that CERs are higher for voice command dictations compared to unscripted conversation.

The average dictation CER% is over 2x higher for severe stuttering. Voice command dictations are often used in speech interfaces and ASR-mediated interactions, so higher error rates could lead to accessibility barriers and psychological harms for PWS.

Adequate and authentic representation of the disability communities in AI data remains a challenge.

Despite the unprecedented scale of the StammerTalk dataset, stuttered speech remains immensely underrepresented in ASR. The models we tested performed worse for PWS, highlighting a major shortcoming in these systems. To close these performance gaps in ASR technologies, we need to create datasets that reflect diverse speech patterns such as stuttering.

	Mild Stuttering	Moderate Stuttering	Severe Stuttering
Conversation	17.7%	20.7%	31.0%
Dictation	25.7%	32.8%	56.6%

Interested?

Partner with us to ensure that speech recognition technology is inclusive for all.

Researchers & Scholars

We’re eager to exchange ideas with and learn from people who are studying fair AI data practices. Learn more about our data [here]

Developers

Interested at building inclusive speech AI models for your applications? Request access to our Dataset

Speech-Language Pathologists

If you are a SLP and are interested in using this data for educational, research, or clinical purposes, we want to hear about your use case.

Contact Us!

Check out our other recent work at Blog page. If you’re interested in joining us, please reach out at partnership@aimpower.org