Community-led Stuttered Speech Collection with StammerTalk

Fair and Authentic Representation of Marginalized Communities in AI Data

The rise of AI technologies, from recommender systems to LLMs, is fueled by the massive amount of data about people and our world. Most of the data used for training AI models were scraped from the web or collected by companies from their users. While the current AI data practice created a host of privacy and copyright issues, one tangible and serious challenge for marginalized communities – such as Black people and people with disabilities – is that they are often under and misrepresented in the data that shapes AI systems, and, as a result, not able to benefit from the technology advancement, but worse, subject to algorithmic biases and harms.

The lack of fair and authentic representation of people with disabilities has been called out by scholars and activists as a key challenge and priority for today’s AI fairness efforts. The solution? We believe it lies in the hands of the communities themselves.

Here we will introduce a grassroots, community-led efforts of an online community of people who stutter, in collecting and curating one of the first and largest dataset of stuttered speech in Mandarin to improve the inclusivity of speech recognitions models powering all types of speech interfaces and automatic phone menus these days.

Closely partnering with the StammerTalk community in this project since Day 1, we have been consistently impressed by the community’s proactiveness, determination, and resourcefulness, and uncovered huge advantages of the community-led data practice compared to data scraping or commercial data brokerage.

By sharing our experience and findings from StammerTalk’s stuttered speech collection project below, we advocate for a new AI data paradigm of community data stewardship, especially for data from and about marginalized communities. Together, we aim to develop a practical guide and introduce new socio-technical infrastructure to support grassroots, community-led data collection and stewardship for communities that have historically faced underrepresentation in AI data.

Background 

StammerTalk

StammerTalk (口吃说) is an online community of Chinese speaking people who stutter that convene over WeChat groups and biweekly online support groups. It currently has approximately 500 members distributed across the world, but primarily from mainland China. StammerTalk operates solely through its volunteers, especially a core team of 10 volunteers who self-organize virtual events, activities, and support the community.

Stuttered Speech Data Collection Project

The idea of creating a stuttered speech corpus for better ASR popped up in a conversation between StammerTalk and AImpower in late 2022. We decided the goal is to collect 100 hours of stuttered speech recorded from 100 individuals who stutter. The StammerTalk team will recruit people who stutter to participate in the recording session, and each recording session will generate about one hour of speech data – half of it would be free form conversations and another half would be command recitation.

The project was led by Rong Gong, one of the founders of StammerTalk, together with Lezhi Wang, a long-time and active member of the StammerTalk community. The data collection process itself was kicked off in January 2023, and over 60 hours of speech data has been collected as of September 2023.

Process

  1. Preparation 

Some important prep work that are necessary for the data collection include: 

  • Identify Partners : To gather technical, operation, and legal resources and support for the project, Rong and the StammerTalk team prepared project pitch and established partnership with AImpower.org (technical advisory), AIShell (speech annotation service), Michigan State University (stuttering research advisory).  
  • Build data collection and annotation guidelines: Rong led the work to establish the protocols and guidelines for collecting and annotating stuttered speech, with the support from Lezhi (StammerTalk), Jia (StammerTalk), Shaomei (AImpower.org), and Xin Li (AIShell).
  • Establish legal framework: Collecting and curating personal, biometrics data from community members across the globe requires tremendous legal advice and expertise on data & privacy laws, IP laws, and International laws. Through AImpower.org, StammerTalk acquired legal guidance and contractual support for this project from  Cooley LLP, a renowned global law firm. Also, since the community (represented by StammerTalk & AImpower) were in the driven seat, we were able to draft a participating process and agreement that offered greater transparency and rights to data contributors compared to what is provided in commercial data collection by companies and data vendors.  
  • Recruit data contributors. Rong posted the first recruitment poster on StammerTalk’s public WeChat account in Jan 2023, detailing the goals and structure of the data collection. The recruitment was met with enthusiasm from the community: over 40 people who stutter signed up within a few days. A second batch of recruitment was conducted in July 2023.
  1. Data collection 

After reviewing and signing participating agreements, Interested data contributors were each scheduled for roughly  60-min video-conferencing session with a community data collector – either Rong or Lezhi – over Zoom or Tencent meeting. Each session followed the following structure:

  • Preparation (5 mins): Participant orientation and consent collection.
  • [Recorded] Unscripted Spontaneous Conversation (30 mins): An unscripted conversation centered between the participant and the community data collector on the participant’s life and stuttering experiences.
  • [Recorded] Voice Command Recitation (30 mins): Participants read out loud a list of frequently used  commands for speech interfaces.

The second and third part of the session make up the final stuttered speech dataset. The recorded stuttered speech was then transcribed by professional annotators from AIShell, with stuttering events annotated.

Advantages of Community-led AI Data Collection

To understand the process, benefits, and challenges for community data stewardship, the AImpower.org team closely followed the progress of this project, and collected ethnographic data about this process through observations, interviews, and surveys with data collectors (Rong & Lezhi) and contributors.

Our data and analysis showed unique advantages of community-led AI data collection over commercial and mainstream data practices. We saw that community members were intrinsically motivated to participate in the data collection process, and they found the process highly enjoyable despite the significant amount of time and effort involved. Besides a useful technical output (i.e. the dataset), the data collection process also created deeper interpersonal connection and a sense of empowerment within the community, strengthening the community’s capacity and agency for self advocacy. 

Driven by love, not money

Contrary to what was seen in commercial or 3rd party led data collection, most people who participated in StammerTalk’s community-led data collocation cared relatively little about the monetary incentives, but were driven by intrinsic goals such as making meaningful contributions to the community and connection with other people who stutter. 

Lezhi, one of the two community data collectors, shared her motivations for spending countless nights and weekends working on this project:

I want to publicize my stutterI want to empower myself through stuttering. I want to differentiate myself from others, from people who do not stutter. My longstanding involvement with the stuttering community gives me insights into the unique challenges faced by stutterers. These (insights) equip me well with ideas on leveraging technology to improve experiences of people who stutter, especially since current technologies often overlook their needs”

– Lezhi

Similarly, most data contributors participated because they recognized the value of this dataset to the stuttering community and wanted to contribute to and connect with the community. As shown in the figure below, when we surveyed 55 data contributors about their reasons to participate in the data collection, the top reasons are 1) “meaningfulness of this initiatives”; 2) “contributions to the stuttering community”; 3) “support StammerTalk’s projects”, and 4) “opportunity to talk to StammerTalk team 1:1”. And “monetary compensation” was rated as the least important by more than half (29/55) of the data contributors.

Barplots showing the distribution of the most and least important reasons for data contributors to participate in the data collection project.
Fig. 1. The most and least important reasons for data contributors to participate in data collection project.

Enjoyment, Knowledge, and Self Empowerment

While commercial data collection processes are often boring, repetitive, and tedious, participants of the StammerTalk data collection actually enjoyed this experience and found themselves gaining more knowledge about stuttering, deeper empathy from others, and a sense of empowerment with their identity as PWS.    

According to our survey, 95% of the data contributors rated their experience in the data collection as “satisfying” or “very satisfying”, and the positive experience was created by factors including: an opportunity to make valid contribution to the community; the relaxed and comfortable setup and social dynamics during data collection; the opportunity to speak to another person who stutters about one’s stuttering experience; and gaining new and deeper knowledge about stuttering.

The community data collectors played an important role in making data contributors feel comfortable and heard. While shared experiences with stuttering instantly brought the data collectors and the contributors psychologically closer, data contributors also acknowledged specific behaviors of Rong and Lezhi that made their experience satisfying and enjoyable, as shown in Fig. 2 below.

Bar chart showing data contributors' feedback on data collectors' competencies during the data collection project.
Fig 2. Data contributors’ feedback on data collectors’ competencies during the data collection project.

In fact, the data collection sessions were so fun and comfortable that many of the data contributors stuttered less frequently than they normally do, and as a result, the data collectors needed to remind the data contributors to voluntarily stutter, or simulate a stressful situation (e.g. job interview), to increase the variety and frequency of stuttering in the dataset. This process prompted many of the data contributors to revisit their default relationship with stuttering – that stuttering is something to be avoided and concealed in one’s speech, turning stuttering into something meaningful, desirable, and a unique quality of ourselves. For many, this shift in perspective was profoundly empowering. One data contributor shared the most important thing he gained from participating in the data collection:

勇于面对真实的自己,接纳自己的口吃行为

The courage to face my true self, and accept my stuttering behaviors

– Data contributor

Current Challenges with Community-led AI Data Collection 

Despite huge tangible and intangible impact for the stuttering community, this project did not run without challenges.

First, collecting and annotating speech data at this scale is labor intensive, requiring serious commitment and donation of personal time and resources from community data collectors and contributors. 

Before kicking off the data collection, Rong has spent significant time and energy to allocate partners and resources, and to train professional speech annotators – who do not stutter – to transcribe and annotate stuttered speech. Rong and Lezhi have also spent a few hours of each week on data collection for the past 10 months. Both of them have full time, demanding tech jobs, and had to squeeze their personal and family time during evenings and weekends to work on this project. Some necessary monetary costs also occurred during the data collection – e.g. 100 RMB per data contributor for their time, and Rong often covered those costs out of his pocket.

Second, the lack of adequate socio-technical infrastructure for community data stewardship creates liability risks and uncertainties for the community: 

  1. Open-sourcing datasets: There are ongoing debates on how to manage and share AI datasets, especially datasets with personal information. While the community was incentivized to open source this dataset to maximize its impact on speech AI, the speech data included in this dataset carries unique characteristics of stuttered speech and would be hard to fully anonymized, thus, it remains an open question on how to responsibly manage and govern the use of this dataset in consideration of scientific values and privacy implications.   
  2. Rigid Legal Frameworks: Existing data protection models are structured around traditional, distinct roles including data subjects (e.g. users and consumers), data controllers (typically companies and organizations), and data processors (usually data vendors or analytics providers). Those roles and assumptions break down in community data stewardship, as the community is collectively taking up all the roles.  This becomes particularly challenging with grassroot, marginalized communities, whose membership and legal status are often fluid and informal.
  3. Geopolitical Complexities in cross-sector collaboration on AI Data. The collaboration between StammerTalk and its diverse set of academic, industry, and nonprofit partners across China and the US added an extra layer of intricacy to the project, especially when working with personal data in a politically charged climate. The current tension between the US and China in technological innovations, especially AI technologies, has added liability costs and additional clearance steps for partner organizations to engage and support this project despite its clear value for the stuttering community.
  4. Navigating Cross-Border, International Data Laws: As a distributed, online community, the data collection process inevitably involved cross-border data transfer and International data protection laws. Navigating these regulations and compliance requirements could become a serious barriers for marginalized communities to undertake similar efforts.  

Ingredients for Success

Despite structural challenges, we have also identify some unique characteristics of the StammerTalk community and their data collection process that contribute to its success, namely:

  1. Technical Expertise: Having in-house expertise or committed partners with technical know-how to ensure that the project has a solid technical vision and feasibility.
  2. Resourcefulness: Sourcing pivotal and necessary resources from community members and partners.
  3. Reputation and Trust: Cultivating trust and relationships  within the community to set the foundation for participation, collaboration, and a good experience for everyone.

What’s Next

Technical Exploration

This project is still ongoing and we expect to wrap up the data collection process by the end of 2023. Our initial analysis of the collected data showed evidence of the diversity and representativeness of stuttering speech patterns within this dataset, and saw promises in using this dataset to effectively tune existing ASR models for stuttered speech. Partnering with the StammerTalk team, we will conduct more technical research and analysis of the collected data, and benchmark popular ASR services to understand their performance disparities between stuttered and non-stuttered speech in Chinese.

Partnership

If you are part of a marginalized group that are interested in starting similar AI data initiatives, we are happy to share our experiences and resources. 

If you are a researcher or a scholar studying fair AI data practice, we are eager to exchange ideas and learn from you.

If you are a researcher or a developer of speech AI technologies and are interested in learning more about this dataset, we are happy to start a conversation.

If you are a SLP and are interested in using this data for educational, research, and clinical purposes, we want to hear about your use case.

In sum, we are actively exploring this new data paradigm and would be eager to partner with anyone who is interested. Please reach out to partnership@aimpower.org.

Join us in  building a future  where fair and empowering AI data practice isn’t just the exception but the norm!

Inclusive virtual conferencing: Creating personal and empathetic experiences 

In today’s fast-paced world, technology has become an integral part of our daily lives. From smartphones to social media platforms and smart home devices, technology surrounds us, shaping our experiences and interactions. While the primary purpose of technology is often functional or utilitarian, it is equally important for AImpower to consider how tech products can promote empathy, authenticity, a sense of belonging, and emotional connections. This note provides insights into our journey of developing products that enhance virtual conferencing experiences for individuals with diverse speaking abilities.

Virtual conferencing could be stressful, especially for those who stutter.

“I’ve had people ask if I I was having a bad internet connection because they thought, Oh, my computer was freezing or I’m having a glitch. But it was due to my stutter… So that’s always kinda awkward, though it’s well meaning…It cut me off and made me more nervous. ” – Participant 1.

We recognize that people with diverse speaking abilities face unique challenges on a virtual conference, including limited non-verbal signals; and misinterpretations of their speech interruptions etc,.Those challenges can hinder efficient communication and lead to increased stress. This can ultimately result in reduced inclusion within teams and organizations. 

Through in-depth interviews with 7 participants who experienced different levels of speaking diversity in China and the US, we identified the major challenges as follow.

  1. People with speaking diverse-ability often invest more cognitive and physical effort in virtual conferencing meetings, weighing the benefits and cost of speaking up during meetings. 
  2. They may experience stress and mental health issues due to self-judgment or social stigma which persist beyond the meetings. 
  3. They consistently deal with inappropriate interaction with their audience, such as educating others about their stuttering, which requires additional effort.

Communities are experts of their own experience.

We believe that communities are the experts in their own experiences, and we aim to collaborate with them to design warm and empathetic products tailored to their needs. So we invited the target community of people who stutter to co-design products, ensuring that their voices are heard. We believe in fostering a co-design approach to create a supportive virtual conferencing environment. Here are some themes that our target community look forward to in their virtual conferencing experiences.

  • Disclosure. Many participants indicated that disclosing their stuttering can enhance their experiences by allowing them to focus on their viewpoints rather than hiding their diversity. This disclosure could trigger more relevant functions for all meeting attendees.
  • Better interactions. Attendees should be informed on how to support individuals with diverse speaking abilities effectively. Timely information to facilitate communication between speakers and listeners, such as a feature indicating when a speaker hasn’t finished, is essential.
  • Environmental Clues: We aim to add more clues in the virtual conference environment, amplify body language cues, improve transcription accuracy, and provide better coaching that focuses on mental health problems.

Next steps

Based on these insights and user requests, AImpower has developed the first batch of prototypes for the next co-design session. Stay tuned for updates as we embark on the second stage of co-design with individuals who have diverse speaking abilities. Together, we will prioritize these features and create refined, high-fidelity prototypes for product development.

Know more about stuttering

In case you are not very familiar with people who stutter. Here are some facts which could be informative. (And here are more facts!!)

  • Roughly 3 million Americans stutter. It affects about 1% of the world’s population.
  • Stuttering is a biological condition. An individual who stutters exactly knows what he or she would like to say but has trouble producing a normal flow of speech.
  • When people stutter, they feel like they have lost control of their speech mechanism. For people who stutter, the observable disfluencies are not the most important part of the condition. Instead, it is the stigma of stuttering – which is associated with various negative impact on their lives – that causes the biggest challenge for people who stutter.

Lost in Translation

Dyslexia is a neurodivergence that impacts one’s ability to process and produce textual information. While previous research has identified unique patterns in the writings of people with dyslexia – such as letter swapping and homophone confusion – that deviant themselves from the text typically used in the training and evaluation of common Natural Language Processing (NLP) models such as machine translation (MT), it is unclear how commercial NLP services perform for users with dyslexia. In this post, we will review some preliminary findings of our work with regard to a machine translation task and dyslexia.

Methodology

For testing these commercial services, a large amount of dyslexic text is required. Unfortunately, we do not have the resources to collect and label that amount of data. Therefore, we utilized the following synthetic dyslexic injection methods (the actual dictionaries that are used for injection can be found here):

  1. Letter Confusion: This is a list of letters that can be confused for sounding and/or looking like each other. 
  2. Homophone Swapping: This is an extensive list of English homophones that are substituted for the original word.
  3. Word Confusion: This is a massive corpus of words that are confusing, common typos for people with dyslexia, homophones and other unique attributes directly related to people with dyslexia. We thank Maria Rauschenberger for providing the list of Jennifer Pedler and Roger Mitton’s work. This later type of injection is the most comprehensive.

We begin by taking a large corpus of text where we have both the source and target language that are accurately translated via a human. In our case, we used the WMT14 english-french news dataset. We are able to control the probability that an injection is made with a Python script. In turn, we are able to see the percentage of words modified by our injection types. Here are some example sentences:

Injection TypeOriginal SentenceSentence with Injection
Letter ConfusionIn Nevada, where about 50 volunteers’ cars were equipped with the devices not long ago, drivers were uneasy about the government being able to monitor their every move.In Nevada, where abouf 50 wolunteers’ cars were equipped with thi devoces not iong ago, driverc were nneasy about the government being able to mohitor thein every movo.
Homophone SwappingNew York City is looking into one.New York City is looking into won.
Word Confusion“The gas tax is just not sustainable,” said Lee Munnich, a transportation policy expert at the University of Minnesota.“The gas tax is just knot sustainable,” said Lee Munnich, eye transportation policy export at the University of Minnesota.
Injection examples

In these examples, the percentage of words modified is around 20% for each injection type. After running the script through our entire dataset, we submitted the text for translation to the translation services available on major cloud platforms such as Azure, AWS, and Google, and we also tested GPT 3.5’s performance.

Sample Outputs:

Using the same sentences from above we can see the output of the selected services:

AWSAzureGoogleGPT
Dans le Nevada, où environ 50 voitures de volontaires étaient équipées de ces dispositifs il n’y a pas si longtemps, les conducteurs craignaient que le gouvernement ne soit en mesure de contrôler chaque mouvement.Dans le Nevada, où environ 50 voitures de wolunteers étaient équipées de ces dévoces il n’y a pas longtemps, les conducteurs n’étaient pas à l’aise à l’idée que le gouvernement puisse mohitor thein every movo.Au Nevada, où une cinquantaine de voitures de bénévoles étaient équipées de ces dispositifs il n’y a pas si longtemps, les conducteurs craignaient que le gouvernement ne puisse les contrôler à chaque mouvement.Au Nevada, où environ 50 voitures de bénévoles ont récemment été équipées de ces dispositifs, les conducteurs étaient inquiets que le gouvernement puisse surveiller chacun de leurs mouvements.
La ville de New York cherche à gagner.La ville de New York cherche à gagner.La ville de New York envisage de gagner.New York City envisage de gagner.
« La taxe sur l’essence est tout simplement durable », explique Lee Munnich, spécialiste de l’exportation des politiques de transport oculaire à l’université du Minnesota.« La taxe sur l’essence est tout simplement durable », a déclaré Lee Munnich, responsable de l’exportation de la politique de transport à l’Université du Minnesota.“La taxe sur l’essence est tout simplement durable”, a déclaré Lee Munnich, spécialiste des politiques de transport à l’Université du Minnesota.“La taxe sur l’essence n’est tout simplement pas durable“, a déclaré Lee Munnich, expert en politique de transport à l’Université du Minnesota.

The parts of sentences highlighted are where we find our meaningful results. For non-French speakers, it may be difficult to distinguish but I will try to explain my best. In the first sentence, every translation makes sense other than Azure’s and most of the services are able to work past the letter confusion except for when it comes to the word “mohitor”. This seems to trick all the services except for GPT in which the sense of the sentence is not lost. For the second sentence, all the services were confused by the homophone “one” swapped with “won”. And for our last sentence, all services are missing a negation except for GPT. This missing negation completely changes the sense of the sentence.

Some Results

With this in mind, we can now look at some initial results from large amounts of text. We use the word error rate (WER) and BiLingual Evaluation Understudy score (BLEU) to calculate the performance of the models. In the remainder of the post, we will use WER; where a higher WER means more mistakes and poorer translation quality. 

For letter confusion, the percentage of words modified increases drastically as a singular change in that word results in the word being classified as modified (i.e. an “a” changes for “e”). There is a quick degradation of translation quality with this injection type. Further analysis for thoughts on the reasoning will be available in the full paper.

For homophone swapping, we can notice a somewhat linear trend for the top-performing model GP. This is similar to the letter confusion. Other models seem to drop performance at a quicker rate.

Finally, our word confusion yields a similar trend. However, we should note that there is no “winner” here.

Conclusion

So what does this all mean? Well, we do notice a significant performance drop with more words being modified even for the large language models like GPT. Unfortunately, this can have effects on people in the real world who do have dyslexia. This is only one NLP task out of the many! It is one of the most basic tasks but it does highlight a root issue: when training these models, this population is not taken into consideration. It is possible for the models to work better for people with disabilities but that requires change. We hope in further work, we can prove the effectiveness of injecting dyslexic text into training/finetuning so models are able to provide for people with different needs! If you are interested in more, you can check out our GitHub and/or look forward to the full research paper being released later this year(with more technical jargon and interesting findings)!

Stuttering and Video Conferencing: Strategies and Best Practices

We had such a blast attending the National Stuttering Association’s annual conference in Florida earlier in July! It is always refreshing and inspiring to be surrounded by people who stutter for days, in which knowledge, stories, laughter, and tears are all shared dysfluently. 😀

Group dinner photo with six people in the view, including Ben and Shaomei. Everyone smiling and looking at the camera.

Hosting our workshop, “Claim Your Virtual Presence”, was an extremely rewarding experience as well. Although it was my first time running a workshop at the NSA conference, and also the first workshop (as far as I know) co-hosted with a remote presenter, I am glad that everything worked out so well. According to our post-workshop survey:

  • More than 80% of the respondents found the information shared at the workshop “very” or “extremely” informative;
  •  of the respondents felt it was “extremely easy” to participate in the workshop
  • 100% of the respondents found themselves “very” or “extremely” likely to apply the strategies and recommendation we discussed in the workshop in their future video calls.
Photo of the workshop, showing 12 people working in two breakout groups.

Our workshop was designed to be interactive and participatory. Our goals were to 

  1. share our research findings on what made video conferencing hard or easy for people who stutter
  2. collectively come up with videoconferencing strategies that could be adopted by people who stutter as well as their allies. 

To achieve the second goal, we designed two breakout sessions, during which we divided the participants into three groups of five to six people each, sharing and learning from each other, and collectively coming up with the top three ideas to share with the big group.


Here we summarize the ideas collected from the stuttering community, both from this workshop and from other PWS who participated in our research elsewhere. A visual handout of these ideas and recommendations can be downloaded here.

Strategies for People who Stutter

  • PRE-MEETING: Prepare yourself and have a good setup
    1. Prepare for the meeting: draft or check out the meeting agenda if you have time; familiarize yourself with the names of other meeting attendees if you don’t already know them; prepare yourself well with what you want to say; meditate before high-stress meetings
    2. Customize your technical setup: adjust the height and position of the desk/computer so that you have a good posture and your voice can be projected well; adjusting the height and position of the camera so that your face is centered and you can make effective eye contact; try using earphones to see whether that helps you speak more effectively; try turning off self-view as it can make a person more self conscious and distract you from your audience;
    3. Request for reasonable accommodations: e.g. extra speaking time, more comfortable speaking order (e.g. being the first during self-introduction), not being spotlit when speaking.
  • DURING-MEETING: Embracing your identity and tap into your non-verbal communication skills
    1. Intentional self-disclosure: be ready to disclose one’s stutter early on, and in an unapologetic, informative way. Research has shown that informative and personalized self-disclosure benefits both the listener (Werle & Byrd 2022) and the speaker in their communication experience (Young et al 2022). 
    2. Non-verbal communications: leave the camera on; be more animated during the meeting, using your body language, facial expression, and eye contact; have a go-to strategy to indicate to others when you want to talk and when you are finished.  
  • POST-MEETING: Recap and reflect
    1. Leverage asynchronized channels: follow up in emails or chat threads if you did not say everything you wanted to say during the meeting.
    2. Reflect on your meeting experience with self compassion: recognize your achievements, treat yourself as you’d treat a good friend.

Here is a handout that summaries these strategies (PDF):

Recommendations for Meeting Organizers and Attendees

Successful video conferencing is a collaboration among all meeting attendees. That means, while PWS can take up the above-mentioned strategies and adapt their behaviors for video conferences, it is as important for other people to accommodate and support each other in video calls.

During the workshop, we also worked out the following recommendations for video conference organizers and attendees. 

  • For meeting organizers/facilitator: setup and run the meetings with inclusivity in mind.
    1. Use the video conferencing platforms that give participants more control over how they show up, e.g. allowing the user to turn off self-view, allowing the user to customize the layout of other participants .
    2. Proactively reach out to meeting participants for their needs and try to accommodate those needs, e.g. allocating more time for a person, setting up a specific speaker turn order, encouraging people to use the chat function.
    3. Setting clear expectations ahead of time for meeting format and attendee responsibilities so that the attendees are prepared for what/how they should contribute.
    4. Use a facilitator whenever possible.
    5. Start out the meeting with ground rules of how to engage in the meetings, e.g. mechanism for turn taking, whether the chat would be moderated. 
    6. Instead of popcorning (selecting people at random to speak when giving introductions or input), inform everyone what the order of speaking will be, so PWS have time to mentally prepare and aren’t caught off guard.
  • For all the attendees: be patient and supportive when others are speaking.
    1. Mindful turn taking: speak one at a time and NOT talk over each other by leaving some time between speakers. In particular, do NOT interrupt PWS when they are speaking, they will let you know when they are done, if it is unclear, confirm with them that they are finished.
    2. Leverage the non-verbal channels: ask simple questions in chat, use emoji’s to communicate emotions when possible.
    3. Actively listen and actively respond: listen attentively to the speaker, focus on the content and ideas, engage and respond (verbally or nonverbally) to show understanding and interests.

Here is a handout that summaries these guidelines (PDF):


What’s next

Building on our existing research and the collective wisdom from the workshop, AImpower will continue our quest for an inclusive video communication environment for all. To do so, we will:

  1. Share the knowledge and raise the awareness: we created a handout that summarizes the challenges and strategies for PWS in video meetings, as well as recommendations for meeting organizers and attendees. Please help us share it widely with and beyond the stuttering community.
  2. Re-envision video conferencing technologies: we are actively co-designing the video conferencing technologies with the stuttering community, to make it more stuttering friendly and more inclusive for everyone.
  3. Advocate for culture changes: we will join other stuttering organizations and marginalized communities in the campaign for a more inclusive meeting culture and reasonable accommodations for speech diversity.

If you want to be part of the change, please:

  1. Share this knowledge with your friends, family, colleagues, and employers to raise awareness: email them the handout, make a social media post that links to this post, have an open conversation with people!
  2. Participate in our co-design workshops: we are running a series of co-design workshops for video conferencing technologies over the summer, if you stutter, please sign up here!
  3. Apply what you learn from this post: if you are a PWS, apply the strategies for PWS we listed above and explore more strategies that might work best for you. Sign up for an interview with us in a few months to reflect on how those strategies work out for you. For everyone, please follow the recommendations for meeting organizers and attendees – together, we can create the change we want to see!

Thank YOU

I am thankful for the NSA to provide the space that met all the requirements we asked for, and appreciate our participants for engaging in all the activities.

A big “thank you” goes to all the workshop participants: thank you for showing up, thank you for being fully present, thank you for your openness, honesty, and creativity. You are the greatest source of knowledge and the inspirations for all of our work!

I am also extremely thankful for all our donors – your generous gifts have enabled us to attend the conference and do everything we described here. Your support has really made a difference for the stuttering community, and beyond! Please continue supporting us as we kick off design and technical work to re-envision videoconferencing with the stuttering community: $200 can fund a co-design workshop that enable us to involve and center the perspectives of people who stutter in the design process!

Last but not least, the biggest kudo goes to my co-presenter, Gary Goldsmith, for spending hours and hours planning out the structure and every single detail of the workshop with me, and for being the most engaging Zoom presenter I have ever known. 

Calling All Stutterers Who Participate in Video Calls

Do you find videoconferencing to be particularly draining? Do you struggle to make your voice heard in virtual meetings?

Join us for our workshop titled “Claim Your Virtual Presence” at this year’s National Stuttering Association (NSA) annual conference in Fort Lauderdale, Florida. The workshop will take place on July 6th at 3:30-4:30PM ET.

Building on AImpower’s research on additional challenges faced by individuals with stuttering during video conferencing, this workshop aims to empower the stuttering community by:

  • Providing practical strategies for individuals who stutter to actively participate in video conferences with greater comfort and ease.
  • Developing a set of recommendations for meeting organizers and employers to better accommodate individuals who stutter in virtual and hybrid meetings.

The workshop will be open and collaborative, featuring various interactive activities that guide all the participants to share our experiences with virtual meetings and learn from one another.

Claim Your Virtual Presence” will be facilitated by Shaomei Wu, the founder of AImpower.org, and Gary Goldsmith, the founder of Improv @ Work. Both facilitators have personal experience with stuttering and are deeply committed to achieving digital equity by creating engaging virtual communication environments for everyone.

Mark your calendars for July 6th at 2:30 PM and join us at the NSA Conference. Together, we can bring about meaningful systemic changes in the lives of individuals who stutter.

CHI – W4A 2023 Trip Report

It has been a month since we returned from our CHI & W4A trip! It was very exciting for AImpower to participate in and connect with the research community. Despite the distributed nature of our organization and work, we really enjoyed meeting, talking, eating, and connecting with others in a shared physical space, and look forward to more opportunities like this (if our fund permits 😛). Our core team also had a small in-person reunion at Hamburg for the first time since the formation of AImpower!

Three people standing in front of some buildings, smiling.

We will share a few highlights and key learnings from our experience in this post.


1. Share our organization and our work

We presented our paper “‘The World is Designed for Fluent People: Benefits and Challenges with Videoconferencing Technologies for People Who Stutter” at CHI. Check out the 10-min presentation video and our slides

Although I have done several conference presentations before, this CHI was the first time that I disclosed my stutter during my presentation (as you can see in the presentation video). While the audience were very supportive and gave me their full attention during my presentation, they also seemed a bit taken back by this piece of information and did not know what to do with it. I do really appreciate the few brave ones who asked questions in the end 🙂

I want to ensure everyone: “yes, you can talk to me in public settings”. I love answering questions and having meaningful conversations, even if I stutter!

I also tried requesting conference accommodations for my stutter for the first time at CHI this year. Everyone I interacted with during this process was trying their best to be helpful, but it was definitely a new thing for them as well. And my end experience was…okay… I will write a separate post about conference accessibility next week. 


2. Connect with the broader research community

With over 4000 participants, CHI was a great place to catch up with many old friends in person, after a three year break. 

We joined the Cornell InfoSci Reunion dinner, energized by all the new faces since Niranjan and I left Ithaca. 

We also connected with the authors of the other two full papers () on stuttering, and had lots of good discussions! It is exciting to see three full papers on stuttering presented at this year’s CHI (only one in the past 20 years!), we hope to collectively raise broader  awareness and interest on this topic within the HCI/CSCW community and inspire more research at the intersection of stuttering and technology.

As the Program co-chair for Web4All, I was glad to see many new faces at W4A this year! The students are particularly inspiring, I was so glad to see the accessibility challenge awards and the best technical paper award all went to papers that were led by students!

Lots of fun and memories..

Photos collage with photos of Shaomei with other conference participants, AImpower members, and  landscape of Hamburg and Austin.

3. Learn and get inspired

Despite all the socialization and networking, I love conferences because I can learn exciting new research in a short period of time! Of course I couldn’t make it to all the sessions that sounded interesting, I did come across many papers that were inspiring, informative, and relevant to our work at AImpower. I will list a few of them below, please share your favorites in the comments!

1) Speech & voice technology

As I mentioned, there were several other papers studying how people who stutter interact with technologies at this year’s CHI. 

I really liked “From User Perceptions to Technical Improvement: Enabling People Who Stutter to Better Use Speech Recognition” by Colin Lea, Zifang Huang, Jaya Narain, Lauren Tooley, Dianna Yee, Tien Dung Tran, Panayiotis Georgiou, Jeffrey Bigham, Leah Findlater from Apple Research. They found that, like everyone else, people who stutter see the utilities of speech recognition systems and want to use them more. However, they face significant technological barriers: people who stutter are frequently cut off from speaking, and are more likely to be misunderstood by current systems. This paper also benchmarked current automatic speech recognition (ASR) model performance for stuttered speech, and showed that the average word-error-rate (WER) for stuttered speech is 19.80%, about 4 times higher than what is reported (~5%) for the general public! And people with a severe stutter, the average WER is 49.2%, meaning that the system would misinterpret every other word, which is basically unusable. The following graph is taken from the paper. It benchmarks WER of Apple’s ASR systems for stuttered speech over the past 5 years, showing a range between 19/7% to 29.5%. While we do see the trend of improved performance, it is still way out of the acceptable range for commercial systems, raising fairness concerns on the consequences of such systems being deployed before closing the performance disparity.

Indeed, as illustrated by Kimi Wenzel, Nitya Devireddy, Cam Davidson, Geoff Kaufman from CMU, an ASR system with higher than average WER is not only an annoyance, but could lead to real psychological harm to marginalized groups. In their paper, “Can Voice Assistants Be Microaggressors? Cross-Race Psychological Responses to Failures of Automatic Speech Recognition”, they tested a Wizard-of-Oz (WoZ) ASR system with low and high WER settings with Black and White participants, and found that Black participants under the high WER setting reported significantly higher level of self-conscious and lower level of self-esteem after using the technology, while same effect did not occur for White participants. 

They argued that, ASR systems with high WER is a type of micro-aggressor, and the act of frequently mis-interpreting and misunderstanding someone’s speech is a form of microaggression that could do real damage to one’s self-esteem and mood, especially for social groups that are more often than others to receive micro-aggression and discriminations. 

A bad system is not just a bad system, it can cause real personal and social harm, which are often oblivious to “mainstream” users and developers!

These two papers made me even more convinced that we should NOT deploy current ASR systems broadly before closing the performance disparity or understanding its impact to different groups.

There are also several technical works that develop different forms of “digitalized speech therapy” for people who stutter. For example, Li Feng, Zeyu Xiong, Xinyi Li, Mingming Fan from the Hong Kong University of Science and Technology (Guangzhou) presented a system called “CoPracTter” for Chinese people who stutter to practice their speech with social support from other community members. While I appreciate the spotlight on the stuttering community in China and see the technical merits of this work, I have some reservations about the focus on fluency and hope systems like that do not reinforce the current prejudice against stuttered speech. Another example is “Speak in Public”, a tool developed by Francesco Vona, Francesca Pentimalli, Fabio Catania, Alberto Patti, and Franca Garzotto in Politecnico di Milano that uses VR, biosensors, and emotion recognition to simulate different real-life speaking situations for people who stutter as part of exposure therapy. I do really like how this tool focuses on uncovering the emotional challenges associated with stuttering rather than the observables, but I also wish it was less positioned as a “treatment” option but an “empathy” builder, as I believe that stuttering should be accepted as a valid form of human speech, not “treated” as if it is a disease.

Two papers from Japan showed promises in ASR without voice. Zixiong Su, Shitao Fang, and Jun Rekimoto from the University of Tokyo built “LipLearner”, an mobile application to lipread short commands and phrases with one-shot learning (e.g. the user only needs to record one example for each new command). And Jun Rekimoto from the University of Tokyo also presented a system called “WESPER” that would recognize whispered speech and regenerate it in a synthetic voice in real time without previous training.

2) Autoethnography as a research method

I have been increasingly interested in autoethnography as a research method, especially for researchers with marginalized identities and/or with personal experiences of the research topic. I was glad to see a few papers demonstrating this method in this year’s CHI and will definitely draw inspiration from them in AImpower’s own research.

For example, in “Self-Tracking to Do Less: An Autoethnography of Long COVID That Informs the Design of Pacing Technologies”, Sarah Homewood form University of Copenhagen used Fitbit to track her activities over 18 months whilst having long-COVID. Her autoethnographic accounts explored and uncovered the design space for fitness tracking activities to support their users to “do less” as a way to manage chronic illness and maintain long term wellness. 

Another interesting autoethnography work is “Maintainers of Stability: The Labor of China’s Data-Driven Governance and Dynamic Zero-COVID” by Yuchen Chen (Univ. of Michigan), Yuling Sun (East China Normal University), and Silvia Lindtner (Univ. of Michigan). The authors described the “big data + iron feet” approach taken by the Chinese government during the Shanghai 2022 COVID lockdown, and the roles of technologies and volunteers play in it. A large amount of the data used in this paper were collected by Yuling Sun (the corresponding author) through autoethnographic writing of her personal experiences as a volunteer frontline worker during the four-month lockdown of her university campus from March to June 2022. 

Autoethnography provides unique value for these papers as it allows the researchers access to fine-grained, private data that would otherwise be inaccessible to 3rd parties. It also enables the researchers themselves to turn a challenging experience or aspect of their lives into novel scientific insights that contribute to our collective discourse of HCI research. 

3) Accessibility

I might be biased, but there seemed to be a greater presence of accessibility-related work at CHI this year, which I found very encouraging. I did not get to see all of them, but here are a few papers that I saw and liked. 

  • Towards Inclusive Avatars: Disability Representation in Avatar Platforms” by  Kelly Mack, Rai Ching Ling Hsu, Andrés Monroy-Hernández, Brian A. Smith, and Fannie Liu. The authors conducted a pretty thorough research on how people with disabilities want and would represent themselves and their disabilities on virtual platforms, and provided some good ideas about how to support users with disabilities to better represent themselves, like showing assistive technologies, showing full body instead of only the head, and ways to show invisible disabilities. 
  • Accessibility Barriers, Conflicts and Repairs: Understanding the Experience of Professionals with Disabilities in Hybrid Meetings” by Rahaf Alharbi, John Tang, and Karl Henderson. Another paper on the topic of disabilities and virtual meetings – it is hard! The authors interviewed people with a wide range of disabilities to uncover their challenges with hybrid meetings. I like how the authors provided recommendations for both the technology (e.g. allow users more control over viewing experience) and meeting organizers (e.g. practice access check-ins). It is never just a technical problem!
  • Counterventions: a reparative reflection on interventionist HCI” by Rua Mae Williams, Louanne Boyd, and Juan E. Gilbert. This is technically not an accessibility paper, but tackles a broader issue in the tech community that overfocus on “intervention”. The paper categorized common current narratives for accessibility work in HCI (i.e. charity, techno benevolence, mutual beneficence, edge-case for innovations, clinical intervention for human body) and suggested reparative readings of past research as a way to build counternarratives in accessibility research that are truly respectful and equitable to people with disabilities. Solid critical work, recommend everyone to have a read.

While AImpower’s work has been geared towards the “experience” aspects of accessibility, i.e. creating new and empowering technological experiences for people with disabilities beyond existing guidelines and legal requirements, attending W4A has reminded me that a lot of accessibility challenges are much more fundamental: most websites are still not or struggling to meet the accessibility requirements, many types of digital content are still not accessible to screen reader users. 

The Web Content Accessibility Guidelines (WCAG) appeared frequently in W4A papers and follow-up discussions. It has been widely regarded as the basic standards and a compliance framework for web accessibility. However, despite lots of efforts developing these guidelines, and a whole “accessibility compliance” industry that implements and evaluates them for companies and governments, the current state of web accessibility is still way below satisfactory for both users and developers. Paper “A Platform to Check Website Compliance with Web Accessibility Standards” by Reinaldo Ferraz , Ana Duarte, João Bárbara, Adriano Pereira and Wagner Meira found that less than 0.5% of the Brazilian websites fully (or almost fully) meet the requirements defined in the guidelines. And even when the developers want to make their sites accessible, it is challenging to understand and fully implement the WCAG guidelines. In “Accessibility Metatesting: Comparing Nine Testing Tools”, the author, Jonathan Pool, tried nine different tools for catching accessibility issues on 121 web pages, only to find each tool discovered numerous issues missed by all the other eight tools.  

I also have a deep appreciation for works to make different types of content accessible to screen reader users, including but not limited to chemical structural formulas, mathematical diagrams, 3D games, and data visualizations:

4) Ethics & equity

I also really liked several papers that studied the ethics and power dynamics in technology, especially from a marginalized community’s perspectives. 

One of my favorite papers in this year’s CHI is “Old Logics, New Technologies: Producing a Managed Workforce on On-Demand Service Platforms”, by Anubha Singh, Patricia Garcia, and Silvia Lindtner from University of Michigan. The authors studied how on-demand delivery service platforms (e.g. Swiggy) control and manage its workforce that force workers to endure conditions of extreme overwork and exhaustion despite their dissatisfaction and attempts to protest and resist. They showed how algorithms reshaped the labor process while reinforcing the mechanism of the capitalism system. E.g. providing financial incentives for earning 5-star ratings, disciplining workers through granular information capture, and reframing workers as “delivery executives” or “delivery partners” to create a sense of belonging to the high-tech industry. As a result, the “exploitation appears self-driven and voluntary, more intense, harder to notice, and therefore difficult to resist”. Systems like this also exploit the lower socio-economic status of the workforce – they are able to reserve a large army of backup workers and suspend/remove workers who try to organize and resist. As AImpower recently started engaging with the disabilities community on the topic of  technology and economic empowerment, I personally resonate with many of the observations made in this paper. While technologies like crowdsourcing platforms and AI introduced employment opportunities for those who might have been rejected or undervalued by the traditional labor market, such technologies still run on top of the capitalistic values of maximizing profits over cost, and are getting more efficient than ever to control and manipulate human workers. 

Another interesting work in this category is “Playing with Power Tools: Design Toolkits and the Framing of Equity” by Adrian Petterson, Keith Cheng, and Priyank Chandra from University of Toronto. They leveraged Nancy Fraser’s dimension of justice to review 17 design toolkits that all aim to help designers create more equitable technology, and find both common frameworks and various limitations of these toolkits. They also offered some good suggestions for approaching equity through design that seemed valuable for AImpower to reflect and reference in our own technology design process.



This post is a lot longer than I expected! 😛 

TL;DR: we learned a lot at CHI and W4A and look forward to incorporating these learnings into our upcoming research and technical work.

The next conference we will be attending is the National Stuttering Association’s annual conference at Fort Lauderdale, Florida in early July. We will host a workshop titled “Claim Your Virtual Presence”! I will share more details about the workshop in our blog soon and hope to see you there!

Stuttering and Voice-activated AI: Panel Reflections

I recently attended the “Voice-Activated AI for Stuttered Speech Convergence Symposium” organized by Michigan State University, Friends, and West Michigan University. I was honored to speak at Sociotechnical Challenges in Voice-Activated AI Panel with a fantastic group of panelists and participants from academic, industry, and nonprofits.

It was an incredible experience to join the technical and research community in a lively and constructive conversation on how to make voice-activated AI more usable and useful for people who stutter, or with any other speech diversities. I was impressed by the shared commitment and determination across sectors to make this a reality, and learned a lot other panelists and participants in this process. I am also glad to be the only panelist who stutters, and got to represent the voices and agency of the stuttering community in this currently predominantly technical endeavor.

I will share a few highlights and key points that stuck with me below. I am not quoting people or referencing them by names due to confidentiality (the symposium was over Zoom but not recorded for the same reason). But please let me know if I can/should reference you, and I will!

1. Data

We spent a significant amount of time discussing the data: why we need it, the definition, availability, collection method, context, and the usage of data:

  • Definition of the data: audio speech recording with metadata about the speaker and text transcription (with timestamps).
  • What kind of data we need: we need data that represent diverse speech patterns, e.g. stuttered speech, deaf speech, speech with heavy accents. We also want represent the heterogeneity within a community and cover both the variability of speech (e.g. in stuttered speech) and intersectionality of community members (e.g. stuttered speech in non-native language).
  • Data Availability:
    • FluencyBank has been the go-to resource for stuttered speech data for both teaching SLPs and training/evaluating speech models. However, as it was not designed for ASR use case but to train SLPs, the transcriptions are not always complete which could be an issue for training ASR. FluencyBank project also has limited resources thus the scale of the data is much smaller than what is normally used in training ASR models.
    • There has an ongoing industry-funded effort in collecting diverse speech data, and the data is intended to be shared more publicly. However, it was unclear what the timelines or terms are regarding data sharing, highlighting again the power imbalance between tech industry and the community in the ownership and control of the data about the community. Other participants also shared experiences of being approached by tech companies to record their speech with very little compensation or transparency on how the data would be used, which potentially further disincentivize the community to contribute to technical efforts like this. I talked about AImpower’s ongoing work with StammerTalk as an example of a respectful, equitable data collection and governance model in which the community led and drive the data collection process and determine how the data will be stored, shared, and used, by whom, for what purposes. Other panelist(s) drew similarity between our approach and the data sovereignty efforts by indigenous communities.
  • Context is important: comparing to other speech diversities, stuttering might be unique due to its variabilities between individuals who stutter and within a person who stutters. Stuttering is much more likely to occur speaking with others than talking to oneself. And my physiological state, my conversation partner, the speaking environment, can all affect how I stutter. Most existing stuttered speech data were recorded at lab settings, and might sound very different from the speech input for voice AI. The speech in daily conversations can also be very different from for formal/high-stake settings such as interviews and presentations. We need to prioritize intersectionality when collecting and using stuttered speech, and also have the stuttering community self-determine how their speech should be interpreted (e.g. whether filler words and disfluencies should be transcribed or omitted).

2. The Use of Voice-AI

We also talked about how the deployment of voice AI impact the life of people with speech diversity and what we can do about it.

According to a recent paper published by Colin Lee and colleagues from Apple recently on the experiences with ASR systems of people who stutter, it is clear that current ASR systems can be un-usable when it comes to stuttered speech (>40% WER for server stuttering users). While these systems are actively commercialized and widely adopted by different businesses and services, they can create not only structural barriers but also mental and cognitive harms to people who stutter.

I have had very bad personal experiences with USCIS phone lines that have been switched to AI-operators in the past few years: I couldn’t get AI to understand the purpose of my call (“in natural language”) and there is no option to get to a human operator anymore – after a few frustrating trials, the AI simply hung up on me. Although traditional human-operated customer services phone lines have not been very stuttering friendly, AI, which is supposed to represent progress and advancement in technology, has made them even worse.

Besides the straight up denial of services, ASR models that can’t handle stuttered speech also creates daily psychology torments to users who stutter. Every time the speech assistant says “sorry I miss that” to my command but responds to the same command by my 4-year-old daughter, I experience the micro-dose of embarrassments and frustration and am reminded again that stuttering is wrong and not acceptable – an idea that I have been working so hard to shake it off from my mind.

Great points were brought up by the panelists on how we can regulate the deployment of premature AI systems like this, with tools such as legislation, collective actions of the community, and/or the self-regulation of the tech industry. Again, we at AImpower believe that the community that is impacted should be the one that make the call, e.g. we should be able to request the type of interview setup that set us up for success rather than being forced to go through phone or AI interviews that potentially create disabling barriers for people who stutter.

3. Going forward

It seems clear that the lack of inclusivity in today’s voice-activated AI systems is not merely a technical problem, but a socio-structural issue with deep roots in ableism and capitalism. It is thus particularly important to take a convergent effort from the community, government, academia, and industry. The panelists and participants proposed a few directions to go forward:

  • Community: let those who are most impacted by the technology take the driver seat! Among all panelists at the technical panel, I was the only person who stutters, perhaps reflecting the lack of representation of the community in AI and technology, and that has to change. I also don’t think it is necessary for community members to have a technical background to contribute, as what I see lacking in the current development of voice-activated AI is not technical expertise but embodied knowledge.
  • Technology: collect more representative and diverse speech data in a collaborative and respectful way.
  • Product / Applications: when designing speech related products and applications, go beyond simply “accepting” and “allowing” stuttered speech, but embrace and celebrate the diversity in human speech as a design asset rather than edge cases. E.g. can we leverage the pauses and filler words as a channel for emotional connections when non-verbal channels are limited?

Making communication technologies more inclusive and accessible for people with diverse speech is a major focus for AImpower.org right now, and we are so excited to see the synergy across the sectors in the symposium. AImpower is always eager to listen to the community’s stories and open to join efforts with partners who shared the same vision. Please reach out if you want to contribute, collaborate, and discuss.

AImpower’s First Year in Review

By Shaomei Wu

(This post contains 2100 words and takes 5 to 7 minutes to read.)

AImpower just turned one year old! 🎂🥳

It has been a busy and exciting first year for AImpower! When I left Meta at the end of 2021, I could not picture how and where I could build my vision and beliefs in equitable and empowering technologies. Yet here we are, one year after, surrounded by friends and communities who believe in and support us, starting a brand-new nonprofit organization that is dedicated to research and co-create technologies for and with those who have been marginalized and silenced by technologies. 

Here is a look at some of the highlights of AImpower.org since its inception in February 2022.

1. Establish the organization

One of the biggest milestones this year is to have AImpower formally recognized as a 501(c)(3) non-profit organization by the IRS. The process was tedious, but actually less complicated than I had expected, thanks to the guidance of our board of directors and our pro-bono legal counsels.

The hardest (and most exciting) part, turned out to be finding a group of people who share the vision and are willing to stand behind it as AImpower’s initial board of directors. And I am so fortunate to find Karin, Niranjan, Lindsay, and Ben, who have possessed such a diverse set of expertise and backgrounds that constantly inspire and enlighten me. I can not thank our board enough for their wisdom, commitment, and companionship – they are the true cornerstones of AImpower!

Big kudos also go to the Justice & Diversity Center of the Bar Association of San Francisco, who connected us to our amazing pro bono legal team from Cooley. Katie and Samantha, thank you for your patience, knowledge, and responsiveness!

I also want to thank everyone I have talked to and learned from in the process of starting AImpower, especially, our incredible board of advisors. Thank you for sharing your knowledge and time with me. Your advice and support is the biggest asset for AImpower!

2. Build our vision

While I always knew that I wanted to explore a new paradigm for researching and building technologies, that is respectful, equitable, and empowering, it turned out to be much more challenging than I expected to articulate and describe it. 

Together with the board and our advisors, we worked hard to define our vision and approaches. We knew what we didn’t want to be, but it took us a lot of discussions (and healthy debates!) to define what we want to be in the tech/nonprofit ecosystem. 

We are happy to share our mission below, together with the key principles we implement to approach our mission. You can also find them at the front page of our website.

We also did a lot of self-reflections and identity exploration to define who we are and how we can effectively contribute to the existing efforts on technology ethics, social justice, and empowerment of marginalized communities. At the end of the day, we see ourselves in this unique position that intersects community, research, and intervention, with the expertise, the willingness, and the flexibility to directly partner with marginalized communities and produce high quality research as well as promising technical interventions.

3. Do our work

While starting up our organization, we have also been busy kicking off our first case study with the stuttering community! 

Stuttering impacts 1% to 4% of the world population, and people who stutter have been marginalized socially and structurally. A lot of that marginalization has been implemented through technologies, for example, job interviews over the phone tend to be much more stressful for people who stutter to communicate and perform well, compared to in-person ones. New communication technologies such as videoconferencing and automated phone menus are often adopted without considering the full spectrum of human speech diversity, and, as a result, pose new challenges for people who stutter (or with other types of atypical speech patterns). After all, current speech recognition models tend to be built under the ableist assumption about fluent speech, introduced by the datasets used to train those models and the design decisions such as the amount of time the model waits before assuming the speech is over. 

Guided by our mission of empowering marginalized groups into technology research and development, our work with the stuttering community spanned across three areas:

  1. Community-oriented research on the experiences and challenges of people who stutter with videoconferencing technologies;
  2. Technical advisory and contribution to other stuttering community organizations on their technical programs and technology development;
  3. Advocacy and support for community-led, grassroots efforts to collect, govern, and utilize stuttering speech data.

We will summarize what we did in each of the areas below.

1) Research: Stuttering in the age of telecommunication

Inspired by stories we heard from stuttering communities (and my personal experiences), we have been working closely with people who stutter to understand and improve their videoconferencing experience. Between February 2022 and August 2022, Shaomei, Ben, and Yijing (our wonderful volunteer researcher) interviewed 13 people who stutter about their use of, and challenges with, videoconferencing technologies such as Zoom, Google Meet, and Microsoft Teams. Our research uncovered both opportunities and barriers with videoconferencing for people who stutter, pointing out several places where the design of current videoconferencing platforms are NOT stutter-friendly. 

The methodology and the findings of our research has been summarized in our research paper, “The World is Designed for Fluent People”: Benefits and Challenges of Videoconferencing Technologies for People Who Stutter, that was recently accepted by CHI 2023 – one of the top conferences for Human Computer Interaction research. Please read our previous blog post and the full paper if you want to learn more.

This research is just the beginning of our investigation on telecommunication technologies for people with speech diversity. We will run follow-up co-design workshops with the stuttering community to explore the design space for inclusive videoconferencing and produce tangible interventions in the coming months. If you are a person who stutter and are interested to participate in the co-design workshop, please sign up! For those who have already signed up, thank you and you will hear from us soon, I promise ☺️!

We are also in desperate need for volunteer designers with user-centered design and rapid prototyping experience, contact us if you are interested!

Subscribe to our blogs to receive the latest updates about this project!

2) Technical advisory and contribution to other community organizations

We are proud to share our technical expertise and experiences with other organizations to expand their impact and service for the stuttering community. I was honored to serve MySpeech, a non-profit organization and a digital community for people who stutter, as their first CTO. Through my role, I was able to support MySpeech by defining its technical vision and roadmap, architecturing its product infrastructure, and formalizing its engineering process. When I handed off my responsibilities in October, I was extremely proud for the structure and technical foundation that was in place and very confident about MySpeech’s ability to grow their technical talents and sustain technology development. And I was not disappointed: the key technical product the team were developing – MySpeech App – was launched in November 2023, serving the stuttering community as the one-stop information hub for everything stuttering-related. This is a huge milestone for MySpeech and I am so grateful to be included in MySpeech’s journey. 

3) Community-led data collection and governance

At the core of AImpower’s mission is “community”. We believe in the agency of the marginalized communities, and are always excited to see communities taking charge of their technological experience. One important aspect of technological experience is the data collected and used by technologies. We are honored to support 口吃说(StammerTalk) – a bilingual podcast and online community for Mandarin/English speaking people who stutter – on their effort to collect and govern one of the first open-sourced stuttering speech dataset in Mandarin-Chinese

The idea of a Mandarin-Chinese stuttering speech dataset by people who stutter for people who stutter emerged from the video interview I did with the StammerTalk community in October 2022, in which I talked about how the lack of diverse speech data might have contributed to the degraded performance of speech recognition models for people who stutter. While there have been some similar datasets (e.g. FluencyBank, UCLASS, and Sep-28k), none of them are in Mandarin, and they were never collected and managed by people who stutter themselves. 

After an initial discussion with us on the technical requirements of the data collection, the community got together and started recording each other right away. We are so impressed by the 口吃说 (StammerTalk) community’s drive and capacity, and find it deeply satisfying and reaffirming to work side-by-side with them. We are committed to continue supporting community-led grassroot efforts like this and exploring multiple fronts (legal, organizational, technical) with 口吃说 (StammerTalk). We will share more about this project and our learnings in our blog, stay tuned!

4. Gather support

As a new organization, one of our top priorities at the current stage is to gather the resources we need to implement and grow our program. Beyond the selfless support from our board, advisors, and friends, we have also made a series of outreach and campaigns to share our mission and connect with a broader audience with shared goals and interests. 

You can find me speaking about AImpower’s vision and ongoing work in different places, including:

We also opened up our donation channels at DonorBox, Benevity, and PayPal Giving Fund, and had a successful grassroot campaign on social media (LinkedIn, Facebook, WeChat) and through word-of-mouth. Big thank you to Bin Fu and Ben Lickly for the campaigns you led to support us in December 2022! 

We were able to raise over $20,000 by the end of 2022 to cover our basic operations – all from small, individual donations. Deep gratitude to all our donors for your support! Your trust in us is truly motivating and invigorating. We are committed to being transparent and accountable in how to use your donations. Feel free to request our 2022 balance sheet and 2023 budget plan.

5. Look ahead to 2023

In the second year of AImpower, we plan to continue the progress and momentum we had this year, grow our team, and potentially expand our program to new problem areas.

In terms of our research and program, we will build on our existing partnerships with the stuttering community to work towards tangible sociotechnical solutions and advocate for broader acceptance for stuttering and stuttering speech by technologies and in the workplace. 

Projects that are already on our roadmap include:

  • Exploring the design space for telecommunication technologies and tinker out a few socio-technical solutions with people who stutter;
  • Supporting 口吃说 (StammarTalk) community to collect and manage the very first open-sourced Mandarin-Chinese stuttered speech dataset. 

Besides our collaboration with the stuttering community, we are also actively engaged with different accessibility and disability advocacy projects, and generally interested in reaching and empowering marginalized groups facing technological challenges. 

As a small organization, we have the luxury of being nimble and flexible, and are always open to ideas and collaboration opportunities with communities and other similar-minded organizations. If you want to work with us, let’s talk!

Another priority for us in the second year is to grow our team and streamline our operations. We are kicking off our first remote internship program in late February with University of Ottawa, and I am super excited to mentor two Master students and provide them with hand-on experience in building technologies with social impact. We also hope to recruit more volunteers and take advantage of the talents of our volunteers through a smoother onboarding process and a better support mechanism. Lastly, we would really love to hire for a few roles (e.g. research, eng, design) and compensate them fairly, we see this as the biggest investment on AImpower and a crucial step to carry our mission in the long-term.

We will continue our outreach and public advocacy efforts, contributing our knowledge and practice to the public discourse on technology justice, gender and racial equity, and disability rights. We already have a series of engagements lined up in the next few months, from guest lectures, conference presentations, art exhibitions, to media campaigns. We will share more as they come out.

We anticipate the need to spend a good amount of energy fundraising. We have submitted some grant proposals and funding applications in 2022, learned a lot in this process, and will continue these efforts to keep us financially sustainable. We are also looking for academic/industry/community partners with shared interest to fund/support our work. 

Last but not least, we will go to more places to physically meet people and talk about AImpower. You will see us in Hamburg Germany for CHI in the last week of April and Austin for the W4A conference right after. Come say hi if you are nearby 👋🏼 !

“Break the invisible wall” – upcoming talk about challenges and opportunities for people who stutter to participate in academic conferences

Post-OSS updates and reflections (1/24/2023)

I tried out the format of doing the “live” presentation with a recorded talk (slides) today, and it worked surprisingly well, especially for virtual meetings!

How it works:
  1. I recorded myself giving the presentation the day before (in one-shot, without any video editing), and shared the recorded video with the seminar host, Rhonda.
  2. At the time of my presentation, I showed up at the meeting but had Rhonda played the recorded presentation, while I monitoring the chat and the reaction of the audience.
  3. After we viewed the recorded presentation, I addressed the questions posed in the chat, and invited people for more questions and discussions. People can choose to join the discussion over the chat or using their voice (unmute). And I answered all questions through speech, while sending some supplemental information over the chat (e.g. referenced papers, news articles).
What I like about this format:
  1. I have more control on the timing. Stuttering is unpredictable. I know the general categories of words that I have more trouble with, but that changes over time, and depending on the situations. I often can’t predict how often, or how long I will be struggling with a word before I open my mouth, which makes strictly-timed speech difficult. By recording the presentation ahead of time at a low-stress setting (by myself or with a supportive friend), I have more mental energy to focus on the content and less struggle when I talk. As a result, although I still stutter from time to time in the recorded, my ideas flow better (and often shorter) and it takes less physical/mental efforts for me to speak.
  2. I have more energy and mental bandwidth to connect with my audience during the presentation. While the recorded talk is being played, I can monitor the chat and looked at the faces of the audience (something I can’t do when I am in the presenter mode over Zoom). The questions in chat and the micro-expressions I saw in the audience really made me to feel connected and validated by the audience, something I almost never felt during a live presentation when almost all my mental energy were spent at annunciating the next word.
  3. The recorded talk can reach beyond the live audience and become more accessible than me talking live! In the age of remote work, we are all collaborating and working with people across different timezones, and it is harder to get everyone at the same place at the same time. Some people are also not aware of the event until it happened. Having the recorded talk, together with the slides, is the best way I can think of to share the experience and the ideas with people who can not attend the event. I asked Rhonda to forward the video and the slides with anyone who is interested at this topic, and I am gonna do the same here. I can also add captions to the video so it is accessible to people who are deaf or hard-of-hearing, or anyone without audio channels on their devices.
  4. I still got the interact with the audience directly afterward the video. I have already enjoyed hearing people’s questions and reactions about my presentation, and I was glad that I still got to do that right afterwards. I used to be very conscious about how fast I talked during the Q&A time, and worried that if I answer one question for too long, I wouldn’t be able to get to all the questions. But with the virtual environment + recorded talk, I could 1) address some questions during the presentation, or at least mentally prepare for them before the Q&A time; 2) share more details related to my answer through chat if I encounter a questions that is a bit complex and takes longer to address. Either way, I felt less time-pressured to speak quickly and fluently during Q&A, and made the experience even more enjoyable to me.

I am so glad that we tried out this format – first time for me, and find it quite promising in improving conference/presentation accessibility for both the presenter and the audience. I hope more and more academia conferences/seminar adapt and normalize this format, and the next time when I do a “recorded live” presentation, it is something so common that it does not require extra explanations on why I choose to present this way.


Please join us in our presentation and follow-up discussion with fellow academia and conference organizers at Virtual Chair’s Organizer Seminar Series next Tuesday, Jan 24, 2023.

Shaomei will share AImpower’s current research, as well as her own experiences, on videoconferencing challenges and benefits for people who stutter, and lead a discussion on strategies to accommodate and empower people with speech diversity in on-site and virtual academic conferences. The title and abstract of the presentation is below:

Break the Invisible Wall: Challenges and Opportunities for People Who Stutter to Participate in In-person and Virtual Conferences

Abstract. As common accessibility accommodations such an accessible transportation services and sign language interpreters become increasingly available and expected in academic conferences, the experiences and needs for conference attendants with invisible disabilities remain less understood and under supported. In this talk, I will share current research as well as my own experiences on the challenges faced by researchers who stutter – estimated 1-3% of the world population – at in-person or virtual conferences. As a neurodevelopmental condition, stuttering in adulthood is incurable and often generates significant social penalties besides speech disfluencies. For researchers who stutter, presenting and networking at in-person events with strict time limits and noisy surroundings can cause great stress that undermines their ability to communicate clearly and confidently. Virtual conferences bring both opportunities and new challenges for people who stutter. My recent interview study with 13 adults who stutter highlights a few structural issues with contemporary video conferencing tools that makes them NOT stutter friendly, such as the design of preset/sticky “self-view” and the limited support for non-verbal channels. I will conclude my talk with a few suggestions to better accommodate the needs of people with speech diversity, but leave ambient time for an open discussion on personal reflections and other accommodation strategies.  


You can register for the event here. We will also update this post with a summary of the talk and the key insights from the discussion afterwards!

Publishing our research at CHI 2023

We are excited to share that, our research with the stuttering community on their videoconferencing experience has been accepted by the ACM CHI Conference on Human Factors in Computing Systems (CHI ’23)!

Our paper, titled “The World is Designed for Fluent People’: Benefits and Challenges of Videoconferencing Technologies for People Who Stutter“, details the methodology and findings from our interview research with 13 participants who stutter from the US and the UK. We look forward to present our work at Hamburg, Germany in the end of April, and hope this work can draw broader public awareness and collective efforts on designing and building more inclusive videoconference environments for all.

TL;DR

At a very high-level, our participants reported the following benefits and challenges with videoconferencing and videoconferencing technologies:

  • Benefits
    • Reducing mental barriers to “show up”
    • Masking stutter
    • Connecting with the stuttering community
    • Increasing public empathy for communication challenges
  • Challenges
    • Stress and distractions with “self-view”
    • Difficulty getting and holding one’s turn using voice
    • Limited non-verbal channels to solicit emotional support from others

As revealed by our research, while people who stutter can still participate, the extra time, labor, and mental efforts required for VC meeting make this experience doubly exhausting and emotionally charging. As we enter a new era in which videoconferencing becomes the dominant and normalized mode for personal and professional communications, the design of videoconferencing technology charged additional emotional and cognitive costs that systematically marginalized people who stutter in social events, employment, civic process, and health care.

Next Steps

Informed by the insights we learned so far, we will be conducting a series of co-design workshops with the stuttering community in Spring 2023 to co-explore the design space for inclusive videoconferencing. A few directions pointed out by our interview participants are:

  • Provide users with more control over their speech and speech related behaviors (such as facial expression, body movements);
  • Support self-disclosure (for both stuttering and other marginalized, vulnerable identities in general) during VC meetings;
  • Offer the speaker real-time therapeutic and emotional support.

Support Us

If you find our work meaningful, please support us in any of these ways:

  • Stuttering friends: we want to learn and work with you! Please sign up here if you will be willing to participate in our co-design workshops, or simply share your feedback and insights on this topic with us.
  • Non-stuttering friends: please consider donating your time and talent to AImpower, we are looking for part-time designers, UX researchers, and software engineers (sign up here).
  • Academic & industry friends: we are open to collaborate! If you are interested at building inclusive telecommunication technologies, please reach out to Shaomei (shaomei@aimpower.org).
  • Media friends: we are ready to talk/write about this work, and AImpower’s work in general, contact us if you are interested to do an interview, podcast, article, or simply chat with us.

Last but not least, we are always looking for donations and fundings, and will be very grateful for any financial support for our work.

Read more

You can find more details about this work on our previous blog post, or read the preprint version of the full paper. Feel free to disseminate this work with the reference:

Shaomei Wu, “The World is Designed for Fluent People”: Benefits and Challenges of Videoconferencing Technologies for People Who Stutter. In Proceedings of the ACM CHI Conference on Human Factors in Computing Systems (CHI 2023).