
Disparities in Whisper’s Automatic Speech Recognition Performance on Disfluent Speech
November 7, 2024Tech4ALL Digest, Nov 19
November 19, 2024By Shaomei Wu
The ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW) has been the home for community-driven, socially conscious research on computing technologies and their impact on the global society. Several of my previous work on AI-powered accessibility innovations were published on CSCW, such as the Automatic Alt-Text, and a face recognition app for people with visual impairments. With about 500 in-person attendees, CSCW always felt more intimate and communal than bigger conferences like CHI.
We were excited to find strong, shared interests within this year’s CSCW in building a more equitable, respectful data future with underrepresented and low-resourced communities. Qisheng presented our research on grassroots, community-driven collection & curation of stuttered speech data for fair speech AI in the “Empower Data Work” session. Jingjin attended the Practicing Inclusivity in AI workshop and discussed participatory methods and ethical factors for engaging stakeholders in AI. Jingjin also presented two papers based on her PhD research, “Meditating in Live Stream: An Autoethnographic and Interview Study to Investigate Motivations, Interactions and Challenges” and “Beyond Meditation: Understanding Everyday Mindfulness Practices and Technology Use Among Experienced Practitioners”, and moderated the panel on “(Un)designing AI for Mental and Spiritual Wellbeing”. She was busy!




Overall, the conference seemed to index heavily on AI FATE (fairness, accessibility, transparency, equity) and marginalized experiences. I will highlight some interesting/relevant papers below, and feel free to share the papers you like in the comments!
- Studying Up Public Sector AI: How Networks of Power Relations Shape Agency Decisions Around AI Design and Use. by Anna Kawakami, Amanda Coston, Hoda Heidari, Kenneth Holstein, and Haiyi Zhu. This study is interesting as it “studies up” the power-structures and reports on how decisions around AI tools adoption were made by interviewing decision makers from 19 public sector 19 agencies involved in human services (e.g. child welfare, predictive policing). The authors identified key conflicts among stakeholders in the decision making process. For example, while frontline workers wanted to be involved in the decision making earlier, the agency leaders and R&D experts viewed the earlier involvement as impractical. They found that, since there was no formal policy in public agencies to require the involvement and consultation of frontline workers, frontline workers have little decision power about AI adoption and their work process in general. In some cases, even when AI tools were seen as inadequate by social workers in practice, they were required to use them for a certain percentage of time to ensure that those tools were not completely ignored! Overall, the study calls for a power-conscious practice for stakeholder participation in the public sector that 1) trains to upskills agency leaders and managers; 2) makes visible the value of participatory approaches; 3) leverages existing power relations.
- Insights from an Experiment Crowdsourcing Data from Thousands of US Amazon Users: The importance of transparency, money, and data use. By Alex Berke, Robert Mahari, Alex Pentland, Kent Larson, and Dana Calacci. In this study, the authors asked Amazon users to export their purchase histories and share for open research. They also conducted a series of experiments with 2 types of visualizations for what data would be shared and 5 types of incentives. The authors found that people’s motivation to share their data is driven by: 1) higher monetary incentives, 2) altruism, 3) more transparency on what and how their data would be shared and used; 4) demographics (females and people with lower educational attainment are more likely to share). Also, in terms of data use, people are more supportive towards data use by researchers than by government agencies (e.g. census bureau). I think this work is really informative in how we engage and involve low-resourced communities in AI data initiatives. The monetary incentive seems to be a particularly tricky question: we want to value people’s efforts and contributions fairly, but also avoid potential exploitation of their socioeconomic status. In this work, the authors made the deliberate design choice of very low monetary rewards (in the scale of a few cents) to avoid exploitation of low-income crowd workers.
- Reimagining Meaningful Data Work through Citizen Science. By Ashley Boone, Annabel Rothschild, Xander Koo, Grace Pfohl, Alyssa Sheehan, Betsy DiSalvo, Christopher A Le Dantec, and Carl DiSalvo. This paper reports on how citizen science project managers design and implement data work with volunteers. They noted the multifaceted nature of different goals and values in these projects – e.g. scientific goal on productivity and participatory goal on promoting science – and how the project managers strategically align these goals and build meaning in citizen science data work. These strategies include: 1) Cultivating intrinsic motivation and creating educational opportunities; 2) building long-term reciprocal communications and social relationships (very different from crowd work); 3) giving participants more autonomy and control on what types of work they do. I think we can definitely leverage these strategies in our own community data work, and keep in mind that these projects are not supposed to be “cheap or easy” – project management is a full time job, and volunteer contribution is not a form of cheap labor.
- Data Stewardship in Clinical Computer Security: Balancing Benefit and Burden in Participatory Systems. By Emily Tseng, Rosanna Bellini, Yeuk-Yu Lee, Alana Ramjit, Thomas Ristenpart, and Nicola Dell. This study engaged survivors of intimate partner violence (IPV) in contextual co-design to explore participatory data stewardship around their own clinical data. The authors showed the participants what data would be seen by the researchers and asked the participants to rate their comfort level with different types of data with different actors. They found that it is easier for the participants to share the data with academics, advocates, and to build better tech products, but not with the police/criminal justice systems due to negative past experiences. After coming up with several prototypes for granular control over data (i.e. open records that allows participants to tag and comment on the shared data; dynamic consent that show different consents levels similar to cookie settings), they also found that participants still felt the overhead for all the data control and wanted a trusted stewart to represent their needs. Their insights and prototypes will definitely influence how we design and think about the stewardship of the stuttered speech data we will be collecting with the stuttering community.
- “Come to us first”: Centering Community Organizations in Artificial Intelligence for Social Good Partnerships. By Hongjin Lin, Naveena Karusala, Chinasa T. Okolo, Catherine D’Ignazio, and Krzysztof Z. Gajos. This study reports the perspectives of community organization members on their experiences and partnerships with AI teams in AI4SG (AI for social good) projects based on 16 semi-structured interviews. The authors found that, despite the optimism on AI4SG, the agendas were often directed by funding agencies and the goals of community organizations were “frequently sidelined in favor of other stakeholders”. The authors thus call for “co-leadership” in AI4SG partnerships from an early stage, under the “data co-liberation” principle. Personally, I can’t say I am surprised by the findings reported here, although I am still disappointed by the realities. It is even more important for us – on behalf of the stuttering community – to claim the decision power in our AI initiatives from day one, and that is exactly what we are doing now with our PJMF project.
To summarize, we appreciate the collective intellectual efforts from the CSCW community in the ethics and power dynamics within AI, data, and technology. We are proud to be part of the movement and look forward to collaborating broadly with scholars, community leaders, policy makers, and practitioners with similar interests and vision.




