AImpower’s First Year in Review

By Shaomei Wu

(This post contains 2100 words and takes 5 to 7 minutes to read.)

AImpower just turned one year old! 🎂🥳

It has been a busy and exciting first year for AImpower! When I left Meta at the end of 2021, I could not picture how and where I could build my vision and beliefs in equitable and empowering technologies. Yet here we are, one year after, surrounded by friends and communities who believe in and support us, starting a brand-new nonprofit organization that is dedicated to research and co-create technologies for and with those who have been marginalized and silenced by technologies. 

Here is a look at some of the highlights of AImpower.org since its inception in February 2022.

1. Establish the organization

One of the biggest milestones this year is to have AImpower formally recognized as a 501(c)(3) non-profit organization by the IRS. The process was tedious, but actually less complicated than I had expected, thanks to the guidance of our board of directors and our pro-bono legal counsels.

The hardest (and most exciting) part, turned out to be finding a group of people who share the vision and are willing to stand behind it as AImpower’s initial board of directors. And I am so fortunate to find Karin, Niranjan, Lindsay, and Ben, who have possessed such a diverse set of expertise and backgrounds that constantly inspire and enlighten me. I can not thank our board enough for their wisdom, commitment, and companionship – they are the true cornerstones of AImpower!

Big kudos also go to the Justice & Diversity Center of the Bar Association of San Francisco, who connected us to our amazing pro bono legal team from Cooley. Katie and Samantha, thank you for your patience, knowledge, and responsiveness!

I also want to thank everyone I have talked to and learned from in the process of starting AImpower, especially, our incredible board of advisors. Thank you for sharing your knowledge and time with me. Your advice and support is the biggest asset for AImpower!

2. Build our vision

While I always knew that I wanted to explore a new paradigm for researching and building technologies, that is respectful, equitable, and empowering, it turned out to be much more challenging than I expected to articulate and describe it. 

Together with the board and our advisors, we worked hard to define our vision and approaches. We knew what we didn’t want to be, but it took us a lot of discussions (and healthy debates!) to define what we want to be in the tech/nonprofit ecosystem. 

We are happy to share our mission below, together with the key principles we implement to approach our mission. You can also find them at the front page of our website.

We also did a lot of self-reflections and identity exploration to define who we are and how we can effectively contribute to the existing efforts on technology ethics, social justice, and empowerment of marginalized communities. At the end of the day, we see ourselves in this unique position that intersects community, research, and intervention, with the expertise, the willingness, and the flexibility to directly partner with marginalized communities and produce high quality research as well as promising technical interventions.

3. Do our work

While starting up our organization, we have also been busy kicking off our first case study with the stuttering community! 

Stuttering impacts 1% to 4% of the world population, and people who stutter have been marginalized socially and structurally. A lot of that marginalization has been implemented through technologies, for example, job interviews over the phone tend to be much more stressful for people who stutter to communicate and perform well, compared to in-person ones. New communication technologies such as videoconferencing and automated phone menus are often adopted without considering the full spectrum of human speech diversity, and, as a result, pose new challenges for people who stutter (or with other types of atypical speech patterns). After all, current speech recognition models tend to be built under the ableist assumption about fluent speech, introduced by the datasets used to train those models and the design decisions such as the amount of time the model waits before assuming the speech is over. 

Guided by our mission of empowering marginalized groups into technology research and development, our work with the stuttering community spanned across three areas:

  1. Community-oriented research on the experiences and challenges of people who stutter with videoconferencing technologies;
  2. Technical advisory and contribution to other stuttering community organizations on their technical programs and technology development;
  3. Advocacy and support for community-led, grassroots efforts to collect, govern, and utilize stuttering speech data.

We will summarize what we did in each of the areas below.

1) Research: Stuttering in the age of telecommunication

Inspired by stories we heard from stuttering communities (and my personal experiences), we have been working closely with people who stutter to understand and improve their videoconferencing experience. Between February 2022 and August 2022, Shaomei, Ben, and Yijing (our wonderful volunteer researcher) interviewed 13 people who stutter about their use of, and challenges with, videoconferencing technologies such as Zoom, Google Meet, and Microsoft Teams. Our research uncovered both opportunities and barriers with videoconferencing for people who stutter, pointing out several places where the design of current videoconferencing platforms are NOT stutter-friendly. 

The methodology and the findings of our research has been summarized in our research paper, “The World is Designed for Fluent People”: Benefits and Challenges of Videoconferencing Technologies for People Who Stutter, that was recently accepted by CHI 2023 – one of the top conferences for Human Computer Interaction research. Please read our previous blog post and the full paper if you want to learn more.

This research is just the beginning of our investigation on telecommunication technologies for people with speech diversity. We will run follow-up co-design workshops with the stuttering community to explore the design space for inclusive videoconferencing and produce tangible interventions in the coming months. If you are a person who stutter and are interested to participate in the co-design workshop, please sign up! For those who have already signed up, thank you and you will hear from us soon, I promise ☺️!

We are also in desperate need for volunteer designers with user-centered design and rapid prototyping experience, contact us if you are interested!

Subscribe to our blogs to receive the latest updates about this project!

2) Technical advisory and contribution to other community organizations

We are proud to share our technical expertise and experiences with other organizations to expand their impact and service for the stuttering community. I was honored to serve MySpeech, a non-profit organization and a digital community for people who stutter, as their first CTO. Through my role, I was able to support MySpeech by defining its technical vision and roadmap, architecturing its product infrastructure, and formalizing its engineering process. When I handed off my responsibilities in October, I was extremely proud for the structure and technical foundation that was in place and very confident about MySpeech’s ability to grow their technical talents and sustain technology development. And I was not disappointed: the key technical product the team were developing – MySpeech App – was launched in November 2023, serving the stuttering community as the one-stop information hub for everything stuttering-related. This is a huge milestone for MySpeech and I am so grateful to be included in MySpeech’s journey. 

3) Community-led data collection and governance

At the core of AImpower’s mission is “community”. We believe in the agency of the marginalized communities, and are always excited to see communities taking charge of their technological experience. One important aspect of technological experience is the data collected and used by technologies. We are honored to support 口吃说(StammerTalk) – a bilingual podcast and online community for Mandarin/English speaking people who stutter – on their effort to collect and govern one of the first open-sourced stuttering speech dataset in Mandarin-Chinese

The idea of a Mandarin-Chinese stuttering speech dataset by people who stutter for people who stutter emerged from the video interview I did with the StammerTalk community in October 2022, in which I talked about how the lack of diverse speech data might have contributed to the degraded performance of speech recognition models for people who stutter. While there have been some similar datasets (e.g. FluencyBank, UCLASS, and Sep-28k), none of them are in Mandarin, and they were never collected and managed by people who stutter themselves. 

After an initial discussion with us on the technical requirements of the data collection, the community got together and started recording each other right away. We are so impressed by the 口吃说 (StammerTalk) community’s drive and capacity, and find it deeply satisfying and reaffirming to work side-by-side with them. We are committed to continue supporting community-led grassroot efforts like this and exploring multiple fronts (legal, organizational, technical) with 口吃说 (StammerTalk). We will share more about this project and our learnings in our blog, stay tuned!

4. Gather support

As a new organization, one of our top priorities at the current stage is to gather the resources we need to implement and grow our program. Beyond the selfless support from our board, advisors, and friends, we have also made a series of outreach and campaigns to share our mission and connect with a broader audience with shared goals and interests. 

You can find me speaking about AImpower’s vision and ongoing work in different places, including:

We also opened up our donation channels at DonorBox, Benevity, and PayPal Giving Fund, and had a successful grassroot campaign on social media (LinkedIn, Facebook, WeChat) and through word-of-mouth. Big thank you to Bin Fu and Ben Lickly for the campaigns you led to support us in December 2022! 

We were able to raise over $20,000 by the end of 2022 to cover our basic operations – all from small, individual donations. Deep gratitude to all our donors for your support! Your trust in us is truly motivating and invigorating. We are committed to being transparent and accountable in how to use your donations. Feel free to request our 2022 balance sheet and 2023 budget plan.

5. Look ahead to 2023

In the second year of AImpower, we plan to continue the progress and momentum we had this year, grow our team, and potentially expand our program to new problem areas.

In terms of our research and program, we will build on our existing partnerships with the stuttering community to work towards tangible sociotechnical solutions and advocate for broader acceptance for stuttering and stuttering speech by technologies and in the workplace. 

Projects that are already on our roadmap include:

  • Exploring the design space for telecommunication technologies and tinker out a few socio-technical solutions with people who stutter;
  • Supporting 口吃说 (StammarTalk) community to collect and manage the very first open-sourced Mandarin-Chinese stuttered speech dataset. 

Besides our collaboration with the stuttering community, we are also actively engaged with different accessibility and disability advocacy projects, and generally interested in reaching and empowering marginalized groups facing technological challenges. 

As a small organization, we have the luxury of being nimble and flexible, and are always open to ideas and collaboration opportunities with communities and other similar-minded organizations. If you want to work with us, let’s talk!

Another priority for us in the second year is to grow our team and streamline our operations. We are kicking off our first remote internship program in late February with University of Ottawa, and I am super excited to mentor two Master students and provide them with hand-on experience in building technologies with social impact. We also hope to recruit more volunteers and take advantage of the talents of our volunteers through a smoother onboarding process and a better support mechanism. Lastly, we would really love to hire for a few roles (e.g. research, eng, design) and compensate them fairly, we see this as the biggest investment on AImpower and a crucial step to carry our mission in the long-term.

We will continue our outreach and public advocacy efforts, contributing our knowledge and practice to the public discourse on technology justice, gender and racial equity, and disability rights. We already have a series of engagements lined up in the next few months, from guest lectures, conference presentations, art exhibitions, to media campaigns. We will share more as they come out.

We anticipate the need to spend a good amount of energy fundraising. We have submitted some grant proposals and funding applications in 2022, learned a lot in this process, and will continue these efforts to keep us financially sustainable. We are also looking for academic/industry/community partners with shared interest to fund/support our work. 

Last but not least, we will go to more places to physically meet people and talk about AImpower. You will see us in Hamburg Germany for CHI in the last week of April and Austin for the W4A conference right after. Come say hi if you are nearby 👋🏼 !

“Break the invisible wall” – upcoming talk about challenges and opportunities for people who stutter to participate in academic conferences

Post-OSS updates and reflections (1/24/2023)

I tried out the format of doing the “live” presentation with a recorded talk (slides) today, and it worked surprisingly well, especially for virtual meetings!

How it works:
  1. I recorded myself giving the presentation the day before (in one-shot, without any video editing), and shared the recorded video with the seminar host, Rhonda.
  2. At the time of my presentation, I showed up at the meeting but had Rhonda played the recorded presentation, while I monitoring the chat and the reaction of the audience.
  3. After we viewed the recorded presentation, I addressed the questions posed in the chat, and invited people for more questions and discussions. People can choose to join the discussion over the chat or using their voice (unmute). And I answered all questions through speech, while sending some supplemental information over the chat (e.g. referenced papers, news articles).
What I like about this format:
  1. I have more control on the timing. Stuttering is unpredictable. I know the general categories of words that I have more trouble with, but that changes over time, and depending on the situations. I often can’t predict how often, or how long I will be struggling with a word before I open my mouth, which makes strictly-timed speech difficult. By recording the presentation ahead of time at a low-stress setting (by myself or with a supportive friend), I have more mental energy to focus on the content and less struggle when I talk. As a result, although I still stutter from time to time in the recorded, my ideas flow better (and often shorter) and it takes less physical/mental efforts for me to speak.
  2. I have more energy and mental bandwidth to connect with my audience during the presentation. While the recorded talk is being played, I can monitor the chat and looked at the faces of the audience (something I can’t do when I am in the presenter mode over Zoom). The questions in chat and the micro-expressions I saw in the audience really made me to feel connected and validated by the audience, something I almost never felt during a live presentation when almost all my mental energy were spent at annunciating the next word.
  3. The recorded talk can reach beyond the live audience and become more accessible than me talking live! In the age of remote work, we are all collaborating and working with people across different timezones, and it is harder to get everyone at the same place at the same time. Some people are also not aware of the event until it happened. Having the recorded talk, together with the slides, is the best way I can think of to share the experience and the ideas with people who can not attend the event. I asked Rhonda to forward the video and the slides with anyone who is interested at this topic, and I am gonna do the same here. I can also add captions to the video so it is accessible to people who are deaf or hard-of-hearing, or anyone without audio channels on their devices.
  4. I still got the interact with the audience directly afterward the video. I have already enjoyed hearing people’s questions and reactions about my presentation, and I was glad that I still got to do that right afterwards. I used to be very conscious about how fast I talked during the Q&A time, and worried that if I answer one question for too long, I wouldn’t be able to get to all the questions. But with the virtual environment + recorded talk, I could 1) address some questions during the presentation, or at least mentally prepare for them before the Q&A time; 2) share more details related to my answer through chat if I encounter a questions that is a bit complex and takes longer to address. Either way, I felt less time-pressured to speak quickly and fluently during Q&A, and made the experience even more enjoyable to me.

I am so glad that we tried out this format – first time for me, and find it quite promising in improving conference/presentation accessibility for both the presenter and the audience. I hope more and more academia conferences/seminar adapt and normalize this format, and the next time when I do a “recorded live” presentation, it is something so common that it does not require extra explanations on why I choose to present this way.


Please join us in our presentation and follow-up discussion with fellow academia and conference organizers at Virtual Chair’s Organizer Seminar Series next Tuesday, Jan 24, 2023.

Shaomei will share AImpower’s current research, as well as her own experiences, on videoconferencing challenges and benefits for people who stutter, and lead a discussion on strategies to accommodate and empower people with speech diversity in on-site and virtual academic conferences. The title and abstract of the presentation is below:

Break the Invisible Wall: Challenges and Opportunities for People Who Stutter to Participate in In-person and Virtual Conferences

Abstract. As common accessibility accommodations such an accessible transportation services and sign language interpreters become increasingly available and expected in academic conferences, the experiences and needs for conference attendants with invisible disabilities remain less understood and under supported. In this talk, I will share current research as well as my own experiences on the challenges faced by researchers who stutter – estimated 1-3% of the world population – at in-person or virtual conferences. As a neurodevelopmental condition, stuttering in adulthood is incurable and often generates significant social penalties besides speech disfluencies. For researchers who stutter, presenting and networking at in-person events with strict time limits and noisy surroundings can cause great stress that undermines their ability to communicate clearly and confidently. Virtual conferences bring both opportunities and new challenges for people who stutter. My recent interview study with 13 adults who stutter highlights a few structural issues with contemporary video conferencing tools that makes them NOT stutter friendly, such as the design of preset/sticky “self-view” and the limited support for non-verbal channels. I will conclude my talk with a few suggestions to better accommodate the needs of people with speech diversity, but leave ambient time for an open discussion on personal reflections and other accommodation strategies.  


You can register for the event here. We will also update this post with a summary of the talk and the key insights from the discussion afterwards!

Publishing our research at CHI 2023

We are excited to share that, our research with the stuttering community on their videoconferencing experience has been accepted by the ACM CHI Conference on Human Factors in Computing Systems (CHI ’23)!

Our paper, titled “The World is Designed for Fluent People’: Benefits and Challenges of Videoconferencing Technologies for People Who Stutter“, details the methodology and findings from our interview research with 13 participants who stutter from the US and the UK. We look forward to present our work at Hamburg, Germany in the end of April, and hope this work can draw broader public awareness and collective efforts on designing and building more inclusive videoconference environments for all.

TL;DR

At a very high-level, our participants reported the following benefits and challenges with videoconferencing and videoconferencing technologies:

  • Benefits
    • Reducing mental barriers to “show up”
    • Masking stutter
    • Connecting with the stuttering community
    • Increasing public empathy for communication challenges
  • Challenges
    • Stress and distractions with “self-view”
    • Difficulty getting and holding one’s turn using voice
    • Limited non-verbal channels to solicit emotional support from others

As revealed by our research, while people who stutter can still participate, the extra time, labor, and mental efforts required for VC meeting make this experience doubly exhausting and emotionally charging. As we enter a new era in which videoconferencing becomes the dominant and normalized mode for personal and professional communications, the design of videoconferencing technology charged additional emotional and cognitive costs that systematically marginalized people who stutter in social events, employment, civic process, and health care.

Next Steps

Informed by the insights we learned so far, we will be conducting a series of co-design workshops with the stuttering community in Spring 2023 to co-explore the design space for inclusive videoconferencing. A few directions pointed out by our interview participants are:

  • Provide users with more control over their speech and speech related behaviors (such as facial expression, body movements);
  • Support self-disclosure (for both stuttering and other marginalized, vulnerable identities in general) during VC meetings;
  • Offer the speaker real-time therapeutic and emotional support.

Support Us

If you find our work meaningful, please support us in any of these ways:

  • Stuttering friends: we want to learn and work with you! Please sign up here if you will be willing to participate in our co-design workshops, or simply share your feedback and insights on this topic with us.
  • Non-stuttering friends: please consider donating your time and talent to AImpower, we are looking for part-time designers, UX researchers, and software engineers (sign up here).
  • Academic & industry friends: we are open to collaborate! If you are interested at building inclusive telecommunication technologies, please reach out to Shaomei (shaomei@aimpower.org).
  • Media friends: we are ready to talk/write about this work, and AImpower’s work in general, contact us if you are interested to do an interview, podcast, article, or simply chat with us.

Last but not least, we are always looking for donations and fundings, and will be very grateful for any financial support for our work.

Read more

You can find more details about this work on our previous blog post, or read the preprint version of the full paper. Feel free to disseminate this work with the reference:

Shaomei Wu, “The World is Designed for Fluent People”: Benefits and Challenges of Videoconferencing Technologies for People Who Stutter. In Proceedings of the ACM CHI Conference on Human Factors in Computing Systems (CHI 2023).

Join us in Austin for Web4All 2023

Web4All is a premier conference for accessibility research, co-located with the Web Conference since 2003. Next year’s conference marks the 20th year anniversary for Web4All (!), and it will take place in Austin, Texas from April 30 to May 1, 2023, with the theme of “Accessibility in the Metaverse“.

Shaomei Wu – the founder and CEO of AImpower.org – has been part of the organization committee for Web4All for several years, serving various roles from Program Committee member to Doctorate Consortium Chair (2021) and Accessibility Challenge Chair (2022). This year, Shaomei will co-chair the conference Program with Maria Rauschenberger from University of Applied Sciences Emden/Leer.

Please consider submitting to Web4All 2023 and joining us in Austin next year! We accept technical papers, communication papers, accessibility challenge papers, and doctorate consortium applications, and the submissions aren’t due until early February 2023.

The conference is small, intimate, and highly interactive! It will take place in person and offers various funds for students to attend.

We look forward to making new friends and seeing old friends in Austin in the end of April 2023!

Research Update: Barriers And Opportunities with Videoconferencing for People Who Stutter

By Karin Patzke and Shaomei Wu

As one of our initial projects, AImpower.org has started to conduct preliminary research focusing on understanding the impacts of videoconferencing tools for people who stutter. Here we share some initial findings and excerpts from a research paper currently under review.

While it can be easy to say that a technology is all “good” or all “bad”, the data indicate that the reality is more complicated. Videoconferencing tools like ‘Zoom’ and ‘Skype’ create new opportunities for people who stutter to manage their speaking environment and even conceal their stuttering, but the design of VC technologies has also posed new challenges for people who stutter. Below is an abbreviated version of the research paper which outlines the benefits and challenges of videoconferencing platforms for people who stutter. We hope you find this update to our work informative and engaging. We welcome your thoughts and questions in the comment section and appreciate the time you take to leave constructive comments!

On Zoom, your voice is so important for you to communicate than before. They can not see your body, your gestures. Your words carry more meaning. You have to impress people with your words. For someone who stutters, that [videoconferencing] is a disadvantage.

Research Interview Participant, 2022

As we enter a new era in which videoconferencing becomes the dominant and normalized mode for interpersonal and professional communications, videoconferencing comes with unique challenges, such as the reduction of non-verbal cues (Bailenson 2021, Neate et al 2022), turn-taking confusion, connectivity/technical difficulties, and generally “Zoom Fatigue.” It is crucial to understand videoconferencing’s impact on and potential challenges for people who stutter. In this work, we explore the experience of people who stutter with videoconferencing technologies through interviews with adults who stutter to understand the challenges, benefits, and strategies for people who stutter during video conferences, in comparison to in-person meetings. We hope our insights will uncover unique challenges for the stuttering community while contributing to the design and development of a more inclusive video-based communication environment for all.

Our research is inspired by the recent breakthrough in stuttering research and therapy that emphasizes the subjective experience of stuttering rather than the perspectives and observations of the listeners (Constantino et al 2022, Tichenor and Yaruss 2019). This epistemic shift led the field to understand that the biggest struggle with stuttering moments is not the disfluencies but the feeling of “being stuck” and and “losing control” (Plexico et al 2005, Tichenor and Yaruss 2019), and people who stutter find it most satisfying when their speech is spontaneous, regardless of how fluent it is (Constantino et al 2022).

Notes on Our Methods

For this research we conducted fourteen semi-structured interviews with adults who stutter from the United States and the United Kingdom to learn about their experience of videoconferencing. The participants were recruited through speech therapy groups, and various stuttering communities. Understanding the multiple forms of suppression at play during professional and public communications, we prioritized the recruitment and inclusion of participants with multiple marginalized identities besides stuttering, such as, women, first-generation immigrants. 

During interviews, we asked participants about their: 

  • Personal background and characteristics of one’s stuttering. 
  • Use of videoconferencing technologies.
  • Experience of videoconferencing
  • Future of videoconferencing

Overall, participants used videoconferencing in professional and/or personal settings and reported various degrees of satisfaction with their video conferencing experiences. As the trend with videoconferencing persists, interview participants’ perception of it evolved. For those who started serious videoconferencing during the pandemic, experiences improved over time. Importantly, videoconferencing presented both benefits and challenges to interviewees. 

Benefits: Creating Safe Spaces For People Who Stutter

Although the usage and context for videoconferencing varied, all of the interview participants saw some benefits throughout the Covid pandemic. While all the benefits noted by participants are applicable to people who do not stutter, some of these benefits are particularly appreciated in the context of stuttering, 

As presented in the interview quotations below, videoconferencing created a “safety distance” that allowed for some people who stutter to have more control over their environment, participation in meetings, and self representation: 

“In terms of me leading a meeting, or facilitating something, events like if I’m in the hot seat, at this point – that I never would have said this before the pandemic – I would actually rather do it virtual. I actually don’t have a lot of experience facilitating, or panel, in person, because a lot of those opportunities came to me during the pandemic. The idea of doing a live TED Talk freaks me out, but I’ve just done a half hour presentation over the computer, and I loved it!”

“I can manage my energy a little bit better on VC, because you are in your own environment. For people who stutter, going to a bar is very challenging, the office can have a similar effect.[…] you just have more control on VC than in person environment.”

Benefits: Reduced Barrier to “Show Up” and Identifying a General Trend Towards More Inclusive Meeting Behaviors.

Research has shown that adults who stutter suffer from heightened social anxiety and are more likely to avoid social situations as a result (Iverach and Rapee 2014). While stuttering is often a very isolating and heavily stigmatized experience, interview participants were able to leverage videoconferencing platforms to connect with others who stutter and build communities that were safe yet supportive. 

As COVID-19 disrupted lives and blurred work-life boundaries, our research indicated that some people developed increased empathy towards others in the context of video conferencing. Interviewees noticed that people became more patient and more understanding with speaking-related challenges, which alleviated some pressure for people who stutter to participate fluently. As one participant noticed, “even fluent speakers have difficulties on Zoom, […] having challenge of being heard is more understood now”. Overall, participants saw a cultural shift towards more inclusive meeting expectations and behaviors that empowered people who stutter to speak up. As another participant noted: 

“10 years ago, it was perfectly acceptable to just have one person speak in the entire meeting; but now, if there is only one person speaking, I will definitely call it out.“

Challenges: The Stress Of “Self View” And “Turn Taking” Remains In Video Conferencing. 

Numerous studies have shown seeing oneself in a mirror can induce self-evaluation and distress (Bailenson 2021, Gaine 2020, Wicklund 1975), and the effect is stronger for certain social groups such as women and Asian, as compared to men and White, respectively (Ratan, Miller, and Bailenson 2022). Not surprisingly, the “self view” function – turned on by default on most video conferencing platforms – stands out as one of the top challenges with videoconferencing for interviewees. Almost all participants indicated some discomfort with the self-view, finding it “distracting” and anxiety-triggering. By default, many videoconferencing platforms rely on audible sound to detect and switch the current speaker, making the first sound/word crucial to signal one’s turn. However, several of the interviewee participants found themselves struggling the most with initializing a sentence. With a limited channel for non-verbal communication strategies such as body language over videoconferencing, they would often be held back by that very first word.

Challenge: Emotional Connection And Social Cues Are Reduced On Videoconferencing Platforms

With a strong association between stuttering and social anxiety, people who stutter are more sensitive to negative evaluations from others, and more likely to engage with safety behaviors such as loss of eye contact (Iverach and Rapee 2014). While the reduction of social cues during video calls has made many feel less connected to their conversation partners (Bailenson 2021), the lack of emotional support could exacerbate the social anxiety experienced by people who stutter, causing further behavioral and emotional struggles. As one participant stated: 

It’s hard for people to know who to look at on Zoom. In terms of eye contact, who do we keep eye contact with? Even if we all know who we want to keep eye contact with, do they know that? How can they tell? They probably can’t. 

Videoconferencing disabled some interviewees’ existing strategies for social and emotional support available in-person meetings. For example, when attending in-person group meetings, some participants described choosing to sit next to friendly, familiar people to feel more relaxed. Small talk and chit chat immediately before a meeting were other strategies participants described. 

To compensate for the lost connections with others, our participants extensively utilized additional channels to make a conscious effort in communicating their emotions and intentions. For example, several participants deliberately lifted the position of their camera to the eye level so that they could mimic the in-person eye contact. Interview participants also described leveraging their identity as people who stutter to better connect with others in virtual meetings. Most participants had proactively disclosed their stutter in high-stress video conferences (e.g. job interviews, presentations, orientations) to build a connection with their audience, and found this strategy was effective at reducing mental stress and bringing in emotional support, even though it did not change how fluent they were. Some participants purposefully embraced the vulnerability that came with the identity as a person who stutters as a way to empower others to all be more open and collaborative in virtual meetings.

Going Forward: The Hidden Costs of Video Conferencing. 

Despite the benefits identified by the interview participants, videoconferencing has introduced significant emotional and cognitive costs to people who stutter. The constant close-up view of their facial features and speaking behaviors contributed to heightened self-consciousness and more negative thoughts. Although the challenge with the “Zoom gaze” is widespread (Bailenson 2021, Gaine 2020), people who stutter are more likely to pay disproportionate attention to “negative” behaviors (e.g. stuttered words, facial tension) that reinforce existing self stigma and social anxiety (Iverach and Rapee 2014). The increased difficulty with turn taking over videoconferencing platforms posed structural barriers for people who stutter to have their voices heard and points across, deepening people’s existing feeling of social isolation and rejection and preventing some participants from seeing themselves as leaders. The uncertainty with turn taking and audience reactions further contributed to a sense of loss of control, one of the defining characteristics of stuttering and a direct cause of many negative emotional and cognitive reactions when people stutter (Tichenor and Yaruss 2019). While the emotional connection with their conversation partners was highlighted by several of participants as the hallmark of their most rewarding communication experience, many interviewees felt systematically disadvantaged in seeking and sharing emotional support now, as their previous strategies – such as physical proximity, hugs, and good eye contact – were largely unsupported by videoconferencing technology.

To overcome these challenges, people who stutter had to adopt strategies that often required extra time, labor, and mental efforts, on top of the existing cognitive and emotional loads associated with stuttering. These hidden costs of video conferencing help to understand why interview participants reported feeling videoconferencing was particularly “exhausting”, “draining” and “unrewarding”, something that they – while still participating in – did “not look forward to”.

Even the named benefits of videoconferencing could lead to questionable long-term outcomes for the stuttering community. For example, the convenience and comfort of a familiar, controlled videoconferencing environment could potentially disincentivize participants from engaging in in-person meetings and social interactions. The ability to hide one’s stuttering behaviors and identity via videoconferencing is also a double-edged sword. Although it does serve people who stutter with better impression and identity management at the moment, it could also hold people back from accepting their stutter and stuttering identity, reinforcing negative emotions associated with stuttering such as embarrassment, guilt, and fear (Cheasman et al 2013). Collectively, if people who stutter all manage to hide their stutter and stutterer identity during video calls, speech-related challenges would be even less understood and further marginalized by mainstream society. While videoconferencing reduced the barriers for people who stutter to find and join the stuttering community, the bonding and commitment within the community might be weakened due to the difficulty in forming emotional connections via video conferences, making the community more fragmented and superficial.

To summarize, videoconferencing and videoconferencing technologies have substantially changed the dynamics and the structure of interpersonal communications, charging potentially profound emotional, cognitive, and social costs to people who stutter. The very design of the videoconferencing technologies that induced such costs (e.g. lack of non-verbal communication support), has also helped render these costs invisible, preventing public awareness on the structural barriers for people who stutter to participate and engage in the age of videoconferencing.

References

Note: in the abbreviated essay above, research findings from other scholars is cited in open brackets. Below are the complete citations to those findings. 

  • Jeremy N. Bailenson. 2021. Nonverbal Overload: A Theoretical Argument for the Causes of Zoom Fatigue. Technology, Mind, and Behavior 2, 1 (feb 23 2021). https://tmb.apaopen.org/pub/nonverbal-overload.
  • C. Cheasman, R. Everard, and S. Simpson. 2013. Stammering Therapy from the Inside: New Perspectives on Working with Young People and Adults. J & R Press. https://books.google.com/books?id=QZVdMwEACAAJ
  • Christopher Dominick Constantino, Naomi Eichorn, Eugene H. Buder, J. Gayle Beck, and Walter H. Manning. 2020. The Speaker’s Experience of Stuttering: Measuring Spontaneity. Journal of Speech, Language, and Hearing Research 63 (2020), 983–1001. Issue 4. 
  • Autumm Gaine. 2020. The Zoom Gaze.
  • Lisa Iverach and Ronald M. Rapee. 2014. Social anxiety disorder and stuttering: Current status and future directions. Journal of Fluency Disorders 40 (2014), 69–82. https://doi.org/10.1016/j.jfludis.2013.08.003.
  • Timothy Neate, Vasiliki Kladouchou, Stephanie Wilson, and Shehzmani Shams. 2022. “Just Not Together”: The Experience of Videoconferencing for People with Aphasia during the Covid-19 Pandemic. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 606, 16 pages. https://doi.org/10.1145/3491102.3502017  
  • Laura Plexico, Walter H. Manning, and Anthony DiLollo. 2005. A phenomenological understanding of successful stuttering management. Journal of Fluency Disorders 30, 1 (2005), 1–22. https://doi.org/10.1016/j.jfludis.2004.12.001  
  • Ronald M. Rapee and Richard G. Heimberg. 1997. A cognitive-behavioral model of anxiety in social phobia. Behaviour Research and Therapy 35, 8 (1997), 741–756. https://doi.org/10.1016/S0005-7967(97)00022-3
  • Rabindra Ratan, Dave B. Miller, and Jeremy N. Bailenson. 2022. Facial Appearance Dissatisfaction Explains Differences in Zoom Fatigue. Cyberpsychology, Behavior, and Social Networking 25, 2 (2022), 124–129. https://doi.org/10.1089/cyber.2021.0112
  • Seth E. Tichenor and J. Scott Yaruss. 2019. Stuttering as Defined by Adults Who Stutter. Journal of Speech, Language, and Hearing Research 62 (2019), 4356–4369. https://doi.org/10.1044/2019_JSLHR-19-00137
  • Robert A. Wicklund. 1975. Objective Self-Awareness11Much of the research reported in this paper as well as the writing of this paper were supported by NSF Grant GS-31890. Sharon S. Brehm, William J. Ickes, Michael F. Scheier, and Melvin L. Snyder are acknowledged for their suggestions and insightful criticisms. Advances in Experimental Social Psychology, Vol. 8. Academic Press, 233–275. https://doi.org/10.1016/S0065-2601(08)60252-X

NSA 2022 Convention Experience and Reflections

By Lindsay Reynolds, board member of AImpower.org

Shaomei Wu, AImpower.org’s founder and CEO, recently attended the National Stuttering Association’ annual conference in Newport Beach, California. Recently we caught up to discuss her experience at the conference as a person who stutters, and how it affects her work with AImpower.org.

Our conversation has been condensed and edited for clarity.


Lindsay: First, for readers who may be unfamiliar with AImpower.org, can you share about the organization and its goals?

Shaomei: AImpower.org is a think- and do- tank that works closely with marginalized communities to design, research and build empowering technologies to remove barriers and address the needs of marginalized groups.

There’s an increasing need to understand the current embedded bias and potential harms that may be caused by technology, but also the underserved needs of delivering tangible benefits from emergent technologies more fairly across our society, so it can be used to level the playing field.

I decided to explore alternative paradigms and mechanisms for design and developing technology, thinking that with the independence I have as a small non-profit and removing the expectations of profit or monetary returns, there would be the freedom and space to be able to do things that are good and right for the world in the long-term.


Lindsay: One of the first groups that AImpower.org is working with is the community of people who stutter. Recently you attended the annual conference of the National Stuttering Association (NSA). What was NSA like?

Shaomei: NSA is more of a community gathering, and it offers a wide range of social activities and opportunities for people to meet and be a part of this community in a supportive space.

One of the most interesting things about the convention is the tradition of open mic sessions. They usually have one or two open mic sessions each day, where basically you go to a room and you can go up to the mic and start talking. But it’s not like stand-up comedy, where you’re expected to tell jokes. In fact, a lot of people go up to share something vulnerable and intimate, and end up in tears. It’s a release and a collective culture building for a lot of the community members who have been silenced and masked in mainstream society and this is one of the only spaces for them to be very authentic.

In fact, after attending a few of the open mic sessions, I gathered my courage and spoke at one of them on the last day, and cried the shit out of myself. It was magical and surreal because I’ve never lost control like that in public and let out so much emotion, but the atmosphere and connection you get from the audience really helps you let down your guard and be vulnerable and authentic because you feel like people will get it. It was a very human and communal experience.

I think my own personal experience as someone with multiple intersectional marginalized identities created this mechanism for me to notice and reflect on the systematic marginalization of certain groups in society, and my background in tech made me more aware of how technology has played a role in it.

Shaomei WU

Lindsay: One value that’s important to AImpower.org’s mission is the importance of intersectionality. What did attending NSA teach you about intersectionality within the community of people who stutter?

Shaomei: I think there’s a lot to be done in terms of how intersectionality is considered and served within the NSA community. There’s a collective action and research sharing component, so the conference has workshops covering different topics. For example, one workshop I really liked was on stuttering and the BIPOC community, which focused on how to empower people of color who stutter.

This is a crucial topic because the organizers and leadership of NSA are predominantly white, even though stuttering happens at the same rate across populations. However, people who attend the convention as well as the people who lead the organization are still following the societal norms of a white-centered composition. This was actually brought up and talked about a lot in that workshop. I really like this point that was brought up by a participant, that the best way to be an ally is to share the power. Which isn’t new, but it’s refreshing to hear someone say that in front of the majority-white NSA leadership team.

The top lesson I learned was the importance of intersectionality, of how diversity and multiplied marginalization is experienced and owned by people in the stuttering community. For example, this workshop of BIPOC community and stuttering and how they are not well represented in the current mainstream stuttering support and advocacy groups was ringing a bell and was alarming for me. And the implication is, in our work for supporting and empowering people who stutter, I want to prioritize the improvement for people with intersectional identities into the research and design and technical work. We cannot really serve this community if we only serve white males who stutter. We have to really start with the people who are most vulnerable, at the margin of the margin, to build a truly empowering technology.

Shaomei Wu

Lindsay: How has your experience attending NSA affected your approach to your work with AImpower.org?

Shaomei: The diversity of speech, even within the stuttering community, was beyond what I personally had knowledge of or had been exposed to, and that challenged some of the assumptions I had made about stuttering and the best way to empower people who stutter.

For example, on the last day of the conference I approached another attendee and introduced myself. It took that person a minute or more to say the first syllables of their name. Even as a person who stutters, I wasn’t prepared for that degree of stuttering, and I wasn’t prepared for how to react to that and the best way to support them.

And what’s notable is that while this was probably the 50th person I had interacted with over the course of the conference, they were the first person I interacted with who stuttered to this degree. So I realized that even within the stuttering conferences, most people who speak or socialize have mild or moderate stuttering. I think the people who have more challenges speaking are disproportionately quiet or silenced in this conference, which is still centered around speaking. There are a lot of activities designed for you to talk. It makes me re-think whether those activities are empowering for a set of people within the stuttering community.

That influences a lot of the thinking I had around the solutions, goals, and technologies involved, and I think AImpower.org will partner with speech pathologists or people who have expertise in understanding a wider spectrum of stuttering, as well as people who are normally excluded from explorations or interventions, like the person I met in the lobby. We’ll learn the best practices and the ways to include the widest range possible for our work.

(See this year’s conference program attached below, provided at westutter.org)

AI and Systematic Marginalization of Disabled People

I had the pleasure to be invited to speak to students from the Columbia Data for Social Good club at their monthly event on April 27, 2022. Under the theme of Equity for Marginalized Communities using Tech, I discussed the issues, challenges, and opportunities with AI for disabled people.

Here is an outline of my talk (slides)

1. The promises of AI for disabled people.

There has been quite a lot of discussions and awareness regarding AI biases and harms towards women and racial minorities. However, when it comes to disability, AI is often portraited as a magic pill that will help or solve disabilities once for all, often by compensating human abilities and acting as the “eyes” or “ears” for people with disabilities. This vision of AI, although possible, is superficial if not misleading given what we have seen in the development and delopyment of AI technologies so far.

2. The problems

I argued that there are two sets of problems AI has imposed on disabled people.

First, many AI advancement does not function with or benefit people with disabilities. Take the speech recognition model as an example, despite its recent advancement, it still works poorly for me as a person who stutter, and causing lots of frustrations and sommetimes fear when the voice interfaces are delopyed more and more widely.

Second, not only do AI systems not benefiting people with disabilities, they can also actively punish and harm people with disabilities. For examples, researchers found that sentences containing disability-related words are more likely to be predicted as “toxic” by Google’s perspective API, which are used widely by lots of other platforms. TikTok also openly admitted that they downrank the videos containing disabled people, in the name of protection and anti-bullying. Algorithmic decisions like this suppress the existence of disabled people and deny our access to public spaces.

3. Factors contributing to AI-powered marginalization of disabled people.

Yes, the discriminations towards disabled people exist in the world before AI came to play, but I think AI systems not only reflect the existing biases but also exerbate them. And excerbation is realized in many different places in AI pipleine, including data, methodology, and evaluation metrics.

  • Data: people with disabilities are under-represented and often misrepresented in the data used to train AI models. As a result, the model’s performance degraded when used by people with disabilities, or the model produces unfavorable predictions for people with disabilities.
  • Methodology: modern ML relies on regularities and pattern in the data, as a result, most models pick up the “center” areas of the distribution and ignore the “outliers” in the periphery.
  • Metrics: most of the standardized benchmarking metrics do not align with the subjective experience of people who are most impacted by these metrics. Take Image captioning as an example, lots of investments have gone into this domain, using assisting people with visual impairements as an aspiration. However, the common metrics used to benchmarking image captioning models, such as BLEU and CIDER, are only measure syntaxt similarity between prediction and the label, without considering the impact of generated captions to people who can’t see the images. When people can not see the image to verify how accurate the caption is, and were instructed to trust the “AI”, some captions can have very negative impact when consumed by people with visual impairements, causing feelins like confusion, self-doubt, embarressment…

4. What can we do

I want share a few techniques to address the issues we discussed above, all driven by the guilding principle of “nothing about us, without us”. We need meaningful representation of the lived experience of marginalized groups in every step of the AI develepment pipeline.

  • Representation in Data. Some tech companies have been soliciting data from people with disabilities to enhance their AI models. Google’s project Euphonia is a good example through which they ask people with atypical speech to record and donate their speech samples. The effectiveness of such approach might be limited due to the lack of trust and privacy concerns of the disabled communities, which is understandable given the history of tech companies harvesting and misusing personal data from minority and marginalized groups. I hope there can be more grassroot movements led and driven by the disability communities, through which we can collect, manage, and use the data about us for our needs.
  • Representation in sampling method. To surface outliers and avoid the majority overwhelmed the marginalized users, Jutta Treviranus proposed the method of “Lawnmower of Justice”, which I found elegant and intriguing. The basic idea is to downsample the dense parts of the dataset to force the model to pay attention to the outliers.
  • Representation in metrics. There has been some research efforts to develop human-centred evaluation metrics for AI models. For example, the work by MacLoed, Bennett, Morris, and Cutrell proposed a new metric – congruence – for image captioning model that measures how sensible a generated caption is in the context of the image. If used as a benchmarking metrics, it can force the model to pay some attention to the potential confusion that can be caused by the generated captions.

5. Beyond equality

Fixing all existing biases and discriminations embedded in current systems may lead us to equality. However, to achieve equity, we need to actively push the envelope and envision new things that deliver tangible benefits and level the playing fields. I share a few of my previous projects as example of how I think about equitable technology for disabled people.

  • Automatic alt-text. This is a feature I developed on Facebook that uses computer vision to describe images for screen reader users of Facebook and Instagram. We made the deliberate decision to not use the off-the-shelf image captioning models because of the issues with those models we discussed above. Instead, we oriented the design and development of the model towards the needs of blind users to “glance through” their feeds and get basic information about the images before interacting with them (same as most people!). And we collaborated closely with visually impaired users on all the design decisions, such as what to describe, in what order, what are the acceptable precision, etc.
  • Writing assistance. This is an experimental grammer/spellchecking tool developed in collaboration with the dyslexia community. Different from most on the market spellcheckers, we trained and benchmarked the model using data specific to dyslexia style writing to prioritize the needs of users with dyslexia, who are mostly impacted by spellcheckers and often face additional challenges when writing on social platforms.

I had a great discussion with students in the Data for Social Good event afterwards. We talked about the collaboration between tech and disability rights activism, the incentives for tech companies to push for progress, and the ways orient our analytics skills towards ethical values and societal benefits.



Speech Technology for People Who Stutter

Speech recognition technology has progressed a lot in recent years, especially when using modern deep learning techniques. While new models such as Facebook AI Research’s wav2vec has achieved 2.43 WER (Word Error Rate) in research benchmark dataset, their performance usually tanks when processing atypical speech, such as, speech by people who stutter, people who are deaf or hard of hearing, or people with an accent.

I am one of 1% adult population that stutter. How stuttering manifests itself varies widely for individuals depending on the speaking situation, but as for me, who stutters covertly for most of my life, it results in tremendous anxiety and fear for speaking, as well as lots of blocks and filler words in my speech. I also find it particularly challenging to manage my speech and my anxiety when speaking under pressure or being recorded.

Here is a speech sample of me introducing myself, just to give you a flavor.

Shaomei’s self intro in her previous life

You can hear lots of uh’s and um’s in my speech. While I have been work towards accepting my speech pattern and stuttering openly, it does cause some inconvenience in my life. For example:

  • Voice controlled systems have trouble understanding me;
  • When there is a time limit, I can not fit as much content during my time slot as others do;
  • People often seem confused or distracted by the way I talk thus pay less attention to what I was talking about;
  • It is very hard for me to leave voicemail;
  • … and more

Current ASR models works poorly with my speech

I tried out a few different Automatic Speech Recognition (ASR) models to transcribe what I was saying on the audio clip above.

For easier comparison, here is the manual transcription of my speech:

Hi my name is Shaomei Wu and I, um, worked at Facebook. Um, my research is at the intersection of, um, AI and, um, accessibility. So I try to build empowering AI, um, technology for marginalized communities.

wav2vec

I used the pretrained wav2vec2-base-960h model as documented here. And here is the result:

HI MY NAME IS CHARMARU AND I AM WORK AT FACEBOOK AND MY RESEARCH IS AT THE INTERJECTION OF AM A I AND AM OCCESSIBELITY SO I TRIE TO BEUDE EMPOWERING A I AM TECNOLOGY FOR MAGONOLIZED COMMUNITIES

Since wav2vec is a char-based model, the transcription contains lots of mispellings and non-words. It is definitely hard to understand what I was saying.

Google Cloud API

Google Cloud’s speech-to-text API gives more readable transcriptions, together with timestamps.

hi my name is Joe and I worked at Facebook and my research is at the intersection of AI and accessibility so I tried to go to empowering a I realized community

One thing I do not really like about this transcription, despite its readability, is how it dropped all the filler words. We will discuss this further in the next session.

SpeechBrain

SpeechBrain is a open-sourced speech recognition toolkit developped and maintained by a large group of academic researchers. It has a bunch of pretrained models that can be easily loaded from huggingface, and the transcription it provides is this:

AYE MY NAME IS SHALL I RUIN I I’M WORKED AT FACE BOOK AND MY RESEARCH IS AT THE INTERSECTION OF AH AYE AND I’M ACCESSIBILITY SO I TRIED TO BUILD EMPIRE IN A TECHNOLOGY FOR MAGINALISED COMMUNITIES

It does run a lot slower than wav2vec or Google cloud speech API, but it seems to achived a good balance between readability and authenticity.

Better ASR for Stuttering Speech

Overall none of these models worked very well with my speech and the WER is significantly higher than what was reported with the benchmark.

One way to improve the performance of current models on my speech is to tune the models. Models like wav2vec are mostly unsupervised and allow you to train the model using raw speech recording only and tune the base model using a relatively small labeled dataset and the model theoretically should improve after tuning.

It is still a lot of work to record and transcribe 30mins or hours of my own speech though. One way to make this easier is through data augmentation, a technique I used when training spellchecker for people with dyslexia. More specifically, we could randomly inject filler words into existing training dataset and feed the model with the pertubated data. But for now let’s move on and save this for the next blogpost.

Auto-tuning Stuttering Speech?

Given that the most obvious thing with my speech is the use of filler words (uhs and ums), I wonder how I would sound like if the AI cleans them up automatically. This is entirely an technical exploration – I actually think that disfluencies should be celebrated but not hidden in almost all social contexts.

For this reason, Google’s transcription does not work for our use case, since it dropped all the filler (and some non-filler) words already.

The transcription from SpeechBrain and wav2vec are both workable, as they did catch my “um”s and semi-consistently transcribe them as “I’m” or “AM”, respectively.

1. Get timestamps for filler words

So now I just need to search the transcription for the filler words and the exact timeframes when they occur.

Although Google did provide timestamped transcription, their timestamps were not accurate enough for this purpose. Also, they do not even transcribe the filler words, remember?

SpeechBrain and wav2vec do not provide timestamps for their transcription, but fortunately, we can still align the transcription with the audio file using an open-sourced tool called gentle (which is built on top Kaldi) and infer the timestamps of each word in the transcription.

2. Remove filler words from stuttering speech

After finding all the start and end times for “AM” in the wav2vec transcription, it is easy to programmatically edit out all these timeframes from the original audio through python.

Here is the result auto-tuned audio. You can noticed that most of the filler words are gone.

Auto-tuning Video with Stuttering Speech

Similar techniques can be used to automatically edit a video to cut out filler words, as shown in this example below.

Shaomei’s intro at her previous job. Original version.
Shaomei’s intro at her previous job. Autotuned version.

Here are some extra steps that are specific to videos:

  1. Extract the audio track from the video
  2. Find the timeframes for filler words in original audio/video.
  3. Cut the original video.

Note that this type of close-up video is probably not the best demo for this technique, I expect it works better for more static videos like slide shows or instructional videos.

Resources

The code for producing the results can be found in this notebook.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s