Dyslexia is a neurodivergence that impacts one’s ability to process and produce textual information. While previous research has identified unique patterns in the writings of people with dyslexia – such as letter swapping and homophone confusion – that deviant themselves from the text typically used in the training and evaluation of common Natural Language Processing (NLP) models such as machine translation (MT), it is unclear how commercial NLP services perform for users with dyslexia. In this post, we will review some preliminary findings of our work with regard to a machine translation task and dyslexia.
For testing these commercial services, a large amount of dyslexic text is required. Unfortunately, we do not have the resources to collect and label that amount of data. Therefore, we utilized the following synthetic dyslexic injection methods (the actual dictionaries that are used for injection can be found here):
- Letter Confusion: This is a list of letters that can be confused for sounding and/or looking like each other.
- Homophone Swapping: This is an extensive list of English homophones that are substituted for the original word.
- Word Confusion: This is a massive corpus of words that are confusing, common typos for people with dyslexia, homophones and other unique attributes directly related to people with dyslexia. We thank Maria Rauschenberger for providing the list of Jennifer Pedler and Roger Mitton’s work. This later type of injection is the most comprehensive.
We begin by taking a large corpus of text where we have both the source and target language that are accurately translated via a human. In our case, we used the WMT14 english-french news dataset. We are able to control the probability that an injection is made with a Python script. In turn, we are able to see the percentage of words modified by our injection types. Here are some example sentences:
|Injection Type||Original Sentence||Sentence with Injection|
|Letter Confusion||In Nevada, where about 50 volunteers’ cars were equipped with the devices not long ago, drivers were uneasy about the government being able to monitor their every move.||In Nevada, where abouf 50 wolunteers’ cars were equipped with thi devoces not iong ago, driverc were nneasy about the government being able to mohitor thein every movo.|
|Homophone Swapping||New York City is looking into one.||New York City is looking into won.|
|Word Confusion||“The gas tax is just not sustainable,” said Lee Munnich, a transportation policy expert at the University of Minnesota.||“The gas tax is just knot sustainable,” said Lee Munnich, eye transportation policy export at the University of Minnesota.|
In these examples, the percentage of words modified is around 20% for each injection type. After running the script through our entire dataset, we submitted the text for translation to the translation services available on major cloud platforms such as Azure, AWS, and Google, and we also tested GPT 3.5’s performance.
Using the same sentences from above we can see the output of the selected services:
|Dans le Nevada, où environ 50 voitures de volontaires étaient équipées de ces dispositifs il n’y a pas si longtemps, les conducteurs craignaient que le gouvernement ne soit en mesure de contrôler chaque mouvement.||Dans le Nevada, où environ 50 voitures de wolunteers étaient équipées de ces dévoces il n’y a pas longtemps, les conducteurs n’étaient pas à l’aise à l’idée que le gouvernement puisse mohitor thein every movo.||Au Nevada, où une cinquantaine de voitures de bénévoles étaient équipées de ces dispositifs il n’y a pas si longtemps, les conducteurs craignaient que le gouvernement ne puisse les contrôler à chaque mouvement.||Au Nevada, où environ 50 voitures de bénévoles ont récemment été équipées de ces dispositifs, les conducteurs étaient inquiets que le gouvernement puisse surveiller chacun de leurs mouvements.|
|La ville de New York cherche à gagner.||La ville de New York cherche à gagner.||La ville de New York envisage de gagner.||New York City envisage de gagner.|
|« La taxe sur l’essence est tout simplement durable », explique Lee Munnich, spécialiste de l’exportation des politiques de transport oculaire à l’université du Minnesota.||« La taxe sur l’essence est tout simplement durable », a déclaré Lee Munnich, responsable de l’exportation de la politique de transport à l’Université du Minnesota.||“La taxe sur l’essence est tout simplement durable”, a déclaré Lee Munnich, spécialiste des politiques de transport à l’Université du Minnesota.||“La taxe sur l’essence n’est tout simplement pas durable“, a déclaré Lee Munnich, expert en politique de transport à l’Université du Minnesota.|
The parts of sentences highlighted are where we find our meaningful results. For non-French speakers, it may be difficult to distinguish but I will try to explain my best. In the first sentence, every translation makes sense other than Azure’s and most of the services are able to work past the letter confusion except for when it comes to the word “mohitor”. This seems to trick all the services except for GPT in which the sense of the sentence is not lost. For the second sentence, all the services were confused by the homophone “one” swapped with “won”. And for our last sentence, all services are missing a negation except for GPT. This missing negation completely changes the sense of the sentence.
With this in mind, we can now look at some initial results from large amounts of text. We use the word error rate (WER) and BiLingual Evaluation Understudy score (BLEU) to calculate the performance of the models. In the remainder of the post, we will use WER; where a higher WER means more mistakes and poorer translation quality.
For letter confusion, the percentage of words modified increases drastically as a singular change in that word results in the word being classified as modified (i.e. an “a” changes for “e”). There is a quick degradation of translation quality with this injection type. Further analysis for thoughts on the reasoning will be available in the full paper.
For homophone swapping, we can notice a somewhat linear trend for the top-performing model GP. This is similar to the letter confusion. Other models seem to drop performance at a quicker rate.
Finally, our word confusion yields a similar trend. However, we should note that there is no “winner” here.
So what does this all mean? Well, we do notice a significant performance drop with more words being modified even for the large language models like GPT. Unfortunately, this can have effects on people in the real world who do have dyslexia. This is only one NLP task out of the many! It is one of the most basic tasks but it does highlight a root issue: when training these models, this population is not taken into consideration. It is possible for the models to work better for people with disabilities but that requires change. We hope in further work, we can prove the effectiveness of injecting dyslexic text into training/finetuning so models are able to provide for people with different needs! If you are interested in more, you can check out our GitHub and/or look forward to the full research paper being released later this year(with more technical jargon and interesting findings)!