Stuttering and Voice-activated AI: Panel Reflections

I recently attended the “Voice-Activated AI for Stuttered Speech Convergence Symposium” organized by Michigan State University, Friends, and West Michigan University. I was honored to speak at Sociotechnical Challenges in Voice-Activated AI Panel with a fantastic group of panelists and participants from academic, industry, and nonprofits.

It was an incredible experience to join the technical and research community in a lively and constructive conversation on how to make voice-activated AI more usable and useful for people who stutter, or with any other speech diversities. I was impressed by the shared commitment and determination across sectors to make this a reality, and learned a lot other panelists and participants in this process. I am also glad to be the only panelist who stutters, and got to represent the voices and agency of the stuttering community in this currently predominantly technical endeavor.

I will share a few highlights and key points that stuck with me below. I am not quoting people or referencing them by names due to confidentiality (the symposium was over Zoom but not recorded for the same reason). But please let me know if I can/should reference you, and I will!

1. Data

We spent a significant amount of time discussing the data: why we need it, the definition, availability, collection method, context, and the usage of data:

  • Definition of the data: audio speech recording with metadata about the speaker and text transcription (with timestamps).
  • What kind of data we need: we need data that represent diverse speech patterns, e.g. stuttered speech, deaf speech, speech with heavy accents. We also want represent the heterogeneity within a community and cover both the variability of speech (e.g. in stuttered speech) and intersectionality of community members (e.g. stuttered speech in non-native language).
  • Data Availability:
    • FluencyBank has been the go-to resource for stuttered speech data for both teaching SLPs and training/evaluating speech models. However, as it was not designed for ASR use case but to train SLPs, the transcriptions are not always complete which could be an issue for training ASR. FluencyBank project also has limited resources thus the scale of the data is much smaller than what is normally used in training ASR models.
    • There has an ongoing industry-funded effort in collecting diverse speech data, and the data is intended to be shared more publicly. However, it was unclear what the timelines or terms are regarding data sharing, highlighting again the power imbalance between tech industry and the community in the ownership and control of the data about the community. Other participants also shared experiences of being approached by tech companies to record their speech with very little compensation or transparency on how the data would be used, which potentially further disincentivize the community to contribute to technical efforts like this. I talked about AImpower’s ongoing work with StammerTalk as an example of a respectful, equitable data collection and governance model in which the community led and drive the data collection process and determine how the data will be stored, shared, and used, by whom, for what purposes. Other panelist(s) drew similarity between our approach and the data sovereignty efforts by indigenous communities.
  • Context is important: comparing to other speech diversities, stuttering might be unique due to its variabilities between individuals who stutter and within a person who stutters. Stuttering is much more likely to occur speaking with others than talking to oneself. And my physiological state, my conversation partner, the speaking environment, can all affect how I stutter. Most existing stuttered speech data were recorded at lab settings, and might sound very different from the speech input for voice AI. The speech in daily conversations can also be very different from for formal/high-stake settings such as interviews and presentations. We need to prioritize intersectionality when collecting and using stuttered speech, and also have the stuttering community self-determine how their speech should be interpreted (e.g. whether filler words and disfluencies should be transcribed or omitted).

2. The Use of Voice-AI

We also talked about how the deployment of voice AI impact the life of people with speech diversity and what we can do about it.

According to a recent paper published by Colin Lee and colleagues from Apple recently on the experiences with ASR systems of people who stutter, it is clear that current ASR systems can be un-usable when it comes to stuttered speech (>40% WER for server stuttering users). While these systems are actively commercialized and widely adopted by different businesses and services, they can create not only structural barriers but also mental and cognitive harms to people who stutter.

I have had very bad personal experiences with USCIS phone lines that have been switched to AI-operators in the past few years: I couldn’t get AI to understand the purpose of my call (“in natural language”) and there is no option to get to a human operator anymore – after a few frustrating trials, the AI simply hung up on me. Although traditional human-operated customer services phone lines have not been very stuttering friendly, AI, which is supposed to represent progress and advancement in technology, has made them even worse.

Besides the straight up denial of services, ASR models that can’t handle stuttered speech also creates daily psychology torments to users who stutter. Every time the speech assistant says “sorry I miss that” to my command but responds to the same command by my 4-year-old daughter, I experience the micro-dose of embarrassments and frustration and am reminded again that stuttering is wrong and not acceptable – an idea that I have been working so hard to shake it off from my mind.

Great points were brought up by the panelists on how we can regulate the deployment of premature AI systems like this, with tools such as legislation, collective actions of the community, and/or the self-regulation of the tech industry. Again, we at AImpower believe that the community that is impacted should be the one that make the call, e.g. we should be able to request the type of interview setup that set us up for success rather than being forced to go through phone or AI interviews that potentially create disabling barriers for people who stutter.

3. Going forward

It seems clear that the lack of inclusivity in today’s voice-activated AI systems is not merely a technical problem, but a socio-structural issue with deep roots in ableism and capitalism. It is thus particularly important to take a convergent effort from the community, government, academia, and industry. The panelists and participants proposed a few directions to go forward:

  • Community: let those who are most impacted by the technology take the driver seat! Among all panelists at the technical panel, I was the only person who stutters, perhaps reflecting the lack of representation of the community in AI and technology, and that has to change. I also don’t think it is necessary for community members to have a technical background to contribute, as what I see lacking in the current development of voice-activated AI is not technical expertise but embodied knowledge.
  • Technology: collect more representative and diverse speech data in a collaborative and respectful way.
  • Product / Applications: when designing speech related products and applications, go beyond simply “accepting” and “allowing” stuttered speech, but embrace and celebrate the diversity in human speech as a design asset rather than edge cases. E.g. can we leverage the pauses and filler words as a channel for emotional connections when non-verbal channels are limited?

Making communication technologies more inclusive and accessible for people with diverse speech is a major focus for right now, and we are so excited to see the synergy across the sectors in the symposium. AImpower is always eager to listen to the community’s stories and open to join efforts with partners who shared the same vision. Please reach out if you want to contribute, collaborate, and discuss.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: