Orthographic Effects in Word Recognition among Spanish-Speaking Learners of English

While speech perception has been found to influence word recognition, what specific aspects of the L1-L2 mapping play the most important role is not well understood. This study explores whether, and if so how, visual-orthographic information influences the mapping of phonemic information in an L2 in both perception and word-recognition. Spanish-speaking English learners completed an AXB and a word monitoring task in English manipulating the presence of the labiodental /v/ and the bilabial /b/ phonemes, which are allophonic in Spanish, but phonemic in English (e.g., best vs. vest ). The results show a clear effect of L1 on L2 learners’ perception and word recognition, partly modulated by the orthographic information seen. These results support models that predict the mapping of L1-L2 phonemes, emphasizing how orthographic information may be an important variable to take into consideration.


Introduction
n increasingly extensive body of research has demonstrated the influence of the orthographic forms (spellings) of a second language (L2) on speech perception, production, and sound categorization among L2 speakers.Numerous studies, such as those recently conducted by Bassetti and colleagues, have shown that L2 speakers can even produce sound contrasts that do not exist in the target language, but that this effect is modulated by visual-orthographic saliency (e.g., Bassetti, 2017;Bassetti et al., 2015;2018;2020).This study focuses on how speech perception and categorization problems might influence word recognition among Spanish learners of English as an L2, and how such an effect may be modulated by orthographic information.
The orthography of both English and Spanish has the graphemes <b> and <v>, but only in English they are distinguished phonetically.While in Spanish both graphemes are pronounced as [b] or [β̞ ], depending on the phonetic context and regional variation, English <v> is realized as a labiodental and <b> as a bilabial (Hualde, 2014;Jogman, Wayland, Wong, 1998).In standard Spanish, the voiced labiodental fricative does not exist.However, Hualde (2014) reported [v] in Spanish as a result of coarticulation, such that a word as Afgano (Afghan) would be pronounced as [avˈɣ ̞ ano].This difference in the phonemic repertoire of English and Spanish (yet similar in their orthographic representations) will be crucial for the current study.
The presence of the voiced labiodental fricative sound [v] in the speech of Spanish-English bilinguals, particularly Spanish adult learners of English as an L2, has been discussed in the literature on Spanish-English bilingual speakers.One viewpoint suggests that the presence of the phoneme /v/ in English influences the Spanish sound system, causing [v] to appear as a variant of the phoneme /b/.In other words, the knowledge of English and the speaker's proficiency in English affect how they realize the Spanish sound contrast (e.g., Takawaki, 2012;Trovato, 2017).This influence is seen in Spanish-English bilingual communities, mainly in areas like New Mexico and California, where there is significant language contact between English and Spanish, but the effect varies depending on proficiency in English (e.g., Takawaki, 2012;Tim, 1976;Torres Cacoullos, A Ferreria, 2000).Stevens (2000) also noted [v] in the speech of Spanish L2 instructors, which was influenced by English orthography.Overall, it seems that the presence of the voiced labiodental fricative /v/ in Spanish-English bilingual speech can be attributed to the influence of English on the Spanish sound system, especially in contexts with significant language contact between the two languages and with increasing proficiency in the L2.
Not only direct language contact and proficiency, but also the impact of orthographic forms (spellings) on speech perception in both native and L2 learners has been reported in the literature.Visual-orthographic input has been shown to both facilitate and hinder speech perception.In some cases, providing alphabetic orthographic input alongside auditory input aids speech perception (e.g., Erdener Burnham, 2005;Escudero et al., 2008), but it can also lead to misperceptions if there are discrepancies between the spoken and written forms (e.g., Hayes-Harb et al., 2010;Escudero, Wanrooij, 2010, Experiment 2;Mathieu, 2016;Rafat, Stevenson, 2019).In most instances, the effect reported has been shown in different ways, causing additions, omissions, and substitutions of sounds in speech perception.Additionally, the findings in L2 speech production studies support the evidence from perception studies, as orthographic forms can result in sound modifications in L2 speech (for a review, see Bassetti et al., 2015).Surprisingly, there is a lack of research on whether orthography affects word recognition, despite its influence on speech perception, production, and metalinguistic awareness in L2 learners.
This study, then, focuses on L2 learners' acquisition of non-native phonemic contrasts and investigates how potential perception issues may affect word recognition, and how this pattern could be linked to the visual influence of orthography.2. Experiment 1: AXB Task 2.1 Participants 32 native speakers of English (16 females; mean age=23 years) and 32 Spanish-speaking learners (12 females; mean age=24 years) participated voluntarily in this study.The native speakers were students at a midwestern university in the USA.The L2 learners were tested at the same institution (20) and in Spain (12).On average, learners had acquired their L2 at the age of 12 (std=2.3years), had learned the language for 12 years (std=3.6years), resided in an English-speaking country for 15 months (std=36.8months) and had a proficiency, as established by a cloze test (Brown, 1980), of 25.94 out of 50 possible points (std=12.04points).None of the participants reported having any visual nor hearing impairment.

Materials
Participants completed an AXB discrimination task.Eighteen nonce word minimal pairs were created contrasting the /b/ and /v/ phonemes in English, to avoid participants using lexical information.Each stimulus pair either contained a voiced bilabial stop (e.g., /bɛmɪʃ/) or a voiced labiodental fricative (e.g., /vɛmɪʃ/), with the contrast appearing in different positions within the word.
The study also included seventy-two fillers.These nonce word fillers were divided in two conditions: The first filler condition included stimulus pairs that contained either a schwa (e.g., /əslɛn/) or did not contain a schwa (e.g., /slɛn/) before the consonant cluster; the second filler condition included stimuli that differed in the presence of one phoneme (e.g., /snun/ vs. /snu /).To make sure participants were paying close attention to the complete word, the sound contrasts of these two filler conditions appeared in different positions in the word (initial, middle, or final).
The words used were checked by a native English speaker to ensure they followed English phonotactic rules and were not real English words.To avoid participants relying on the physical properties of the sounds, the stimuli were recorded by three different speakers.The order of presentation was consistent, with Speaker 1 (Midwest dialect) producing A, Speaker 2 (East Coast dialect) producing X, and Speaker 3 (Midwest dialect) producing B. The experiment was designed in a Latin square pattern to prevent participants from hearing both A and B as X.

Procedures
The stimuli were presented using Paradigm by Perception Research Systems, Inc. (Tagliaferri, 2005).Participants were comfortably seated in a quiet room, and they were instructed in their native language to listen carefully to a series of three nonce words and to choose whether the second word (X) was more similar to the first or to the third word (A or B).Participants were seated comfortably at about 20-30 cms.from the screen and materials were presented in Arial (20 pts.) font.The interstimulus interval was 1,000 ms and participants made their decision by pressing one of the two buttons of the mouse.The next trial started immediately after the participants entered their response.A practice session of six stimuli with feedback preceded the main session of the experiment (which did not have feedback).All trials were randomized across participants.

Data Analysis
The participants' accuracy was analyzed with a logistic regression model (cf.Baayen, 2008), using the glm package (Hothorn, Everitt, 2014) in R (R Development Core Team, 2009).L1 was considered as a categorical predictor with two levels (English vs. Spanish), with the English group representing the baseline.Phoneme type was considered as a categorical predictor with two levels (Bilabial vs. Labiodental), with the bilabial representing the baseline, as both languages have the bilabial phoneme in their L1s.The effect of the predictors was assessed using loglikelihood tests comparing models with and without them.Two sets of models were run-one on all the accuracy rates with L1 and phoneme type as predictors (and its interaction), and one on the L2 learners' accuracy rates with phoneme type, proficiency, and its interaction.The effect of L2 proficiency was assessed by comparing models that included proficiency with models that did not include it; in each case, the model with the best fit was kept.Since proficiency did not improve the model, only the analysis of all the accuracy data with L1 and phoneme type (but not its interaction) is reported.Participants and item were included as random variables.

Results
Figure 1 presents the mean accuracy results for the two groups in the two phoneme conditions, and Table 1 presents the results of the logit mixed-effects model for all participants' accuracy.The model summarized in Table 2 revealed that Spanish L2 learners of English were statistically less accurate than native speakers in discriminating stimuli with [b] and [v] sounds, and that the labiodental sound was statistically more easily identifiable as compared with the bilabial sound by both groups similarly.

Discussion
In this first experiment, an AXB task was used to examine whether Spanish L2 learners would have any difficulty in the discrimination of the bilabial and labiodental contrast in English.Results showed that native speakers were more accurate than the L2 learners and that the labiodental sound was perceived more accurately than the bilabial sound.However, the lack of interaction between the group condition and the phoneme type condition indicates that, while native speakers were more accurate, L2 learners showed the same pattern of results independently whether they were listening to a bilabial (existing in their L1 as a phoneme) or a labiodental (not existing in their L1 as a phoneme, but as an allophone of /f/) or a labiodental sound (not existing in their L1).
The results of this discrimination task for the two groups were in accordance with the phonemic inventory of their L1s.The Spanish [v] distinction in their L1.Considering that the two sounds are allophonic (in very limited contexts, as seen in the introduction) in Spanish, these L2 learners of English have some issues in order to discriminate between them.
These findings add support to the body of evidence that suggests that contrasts that are allophonic in the L1 are, indeed, difficult to discriminate in a contrast which is phonemic in the L2 (e.g., Abramson, Lisker, 1970;Goto, 1971;Lisker, Abramson, 1967;Polka, 1992;Polka, Werker, 1994;Strange et al., 2001;Veleva, 1985;Werker Lalonde, 1988;Werker, Logan, 1985;Werker, Tees, 1984).Yet, it is yet to be established how these perception errors influence L2 word recognition.Existing literature indicates that challenges in accurately perceiving L2 sounds can lead to heightened lexical competition, resulting in less efficient word recognition (e.g., Broersma, Cutler, 2011;Escudero, 2007;Martínez-García, 2021;Weber, Cutler, 2004).Experiment 2 was thus created to examine how the misperception of this phonemic contrast in English could impact L2 word recognition and how orthographic information would play a role in the perception of this sound contrast.

Participants
In Experiment 2, the same participants who were involved in Experiment 1 took part.They performed the word monitoring task before the AXB task to prevent any potential influence on their word recognition from what they might have noticed in the AXB task.

Materials
For this experiment, all groups completed a word monitoring task, which requires participants to keep track of a pre-designated target word in the acoustic input (see Kilborn, Moss, 1996).Participants were asked to monitor a target word either containing or not the target word (i.e., best vs. vest) in semantically ambiguous sentences that could (or could not) contain that given sound (i.e., match vs. mismatch the target word).The 48 target items included had either the phoneme /b/ or /v/ in different positions within the word (e.g., "bail" vs. "veil").To prevent participants from relying on lexical cues, semantically ambiguous sentences were created and checked for plausibility and ambiguity by two native English speakers.Additionally, the location of the target word within the sentence was manipulated to prevent participants from forming expectations.
The experiment also included ninety-six filler items, which shared characteristics with the fillers used in Experiment 1. Half of these fillers consisted of minimal (36) and near-minimal (12) pairs in which the presence of a vowel at the beginning of /s/-initial clusters was manipulated (e.g., "state" vs. "estate").The other half of the fillers differed in the number of phonemes they contained or in one of their phonemes (e.g., "stop" vs. "top" or "snow" vs. "know").These fillers had sound contrasts appearing in different positions in the word (word-initial, word-medial, or word-final).All the sentences, both experimental and fillers, were recorded by a female native speaker of American English with a Midwestern accent (Speaker 3 of Experiment 1).A randomized Latin square design was used to present the stimuli.

Procedure
The presentation of the stimuli was done using Paradigm.In each trial, participants saw the target word in the middle of the screen in capital letters and for 1,000 ms.(e.g., BEST).The word disappeared at the same time as the audio started playing.Then, they listened to a sentence that might or might not have included the word they just saw (e.g., I gave her my best or I gave her my vest.).Participants' task was to decide whether the sentence contained the word they saw on the screen by pressing the button "SÍ", or otherwise press "NO" in a mouse.Participants could make the decision as soon as they identified the word in the sentence (before finishing the audio) or, otherwise, wait to the end of the sentence to make sure the word did not appear otherwise.As soon as they entered their response, the next trial started.
The experiment started with six practice trials, which included feedback.During the main session of the experiment, participants did not receive any feedback.

Data Analysis
The participants' accuracy in this task was analyzed using a logistic regression model as in Experiment 1.Three variables were considered as categorial predictors.First, L1 with two levels (English vs. Spanish), with English serving as the baseline.Second, the effects of the phoneme type (e.g., /b/ vs. /v/), with the bilabial sound as the baseline.Finally, the match between the word to be monitored and the word in the auditory stimulus (match vs. mismatch), with "Match" serving as the baseline.
In this study, two sets of models were run.The first set used L1 as a predictor and considered all participants' accuracy rates.The second set focused on L2 learners and their accuracy rates, along with their proficiency.However, the results regarding L2 proficiency are not reported because the model without proficiency provided the best fit.Both sets of models included participant and item as random variables in their analysis.

Results
Figure 2 presents the mean accuracy of the two groups in the word monitoring task, and Table 3 presents the results of the logistic regression model for all participants' accuracy.In results of the model, the following effects can be observed: A main effect of L1, indicating that learners showed a different pattern compared with the native speakers; an interaction between auditory type and L1 for Spanish L2 learners, which shows that, unlike the native vest speakers, these learners showed different accuracies when the words contained /b/ vs. /v/; an interaction between auditory type and phoneme type, which indicates that bilabials and labiodental sounds showed different accuracies depending on whether they were presented in the match or mismatch condition; and a three-way interaction between phoneme type, auditory type, and L1, indicating that Spanish speakers' difficulty in identifying the presence of a bilabial or labiodental sound differed depending on whether the auditory word matched or mismatched the written word.
To better understand the three-way interaction, two follow-up fit linear mixed-effects models were run to test for the effect of phoneme type and auditory type independently for the native speakers and Spanish L2 learners of English.These models are shown in Table 4 and Table 5 for the native speakers and Spanish L2 learners, respectively.These results showed a clear difference between the two groups.On the one hand, the native speakers (Table 4) showed an interaction between auditory type and phoneme type, indicating that they were statistically more accurate in the mismatch condition, when the auditory stimuli contained a labiodental sound, than in any of the other conditions.On the other hand, the Spanish L2 learners (Table 5) showed a main effect of auditory type, which indicates that they were more accurate in the match than in the mismatch condition, and a marginal main effect of phoneme type, which indicates that they were marginally more accurate in the labiodental than the bilabial condition.

Discussion
Experiment 2 was designed to investigate the impact of perception issues on word recognition, by trying to determine if the perception difficulties observed in Experiment 1 affected the learners' ability to recognize words.The results of Experiment 2 showed a distinct division.While the native speakers showed no problems in detecting the target word in the auditory stimuli, the native Spanish speakers had difficulty detecting the target words, particularly in the mismatch conditions.Approached cautiously, these findings imply that the lack of a phonemic distinction in the learners' L1 negatively impacts their ability to access words differentiated by the bilabial-labiodental contrast in English.These word recognition challenges might be connected to the potential occurrence of Spanish L2 learners of English mistakenly activating competing words when they encounter words containing /b/ or /v/ (supported by the results from the mismatch condition).
The concept of L2 learners unintentionally activating competing lexical items has already been documented in the literature that examines L1-L2 category assimilation and word recognition (e.g., Broersma, Cutler, 2011;Pallier, Colomée, Sebastián-Gallés 2001;Weber, Cutler 2004).To perceive spoken words, listeners must match the incoming auditory information with their stored lexical representations.Word recognition models (e.g., Marslen-Wilson, 1987;McClelland, Elman, 1986;Norris, 1994) propose that lexical candidates are activated in multiple parallel pathways and then compete for activation.This means that as the acoustic input unfolds, all words consistent with the input become active in the lexicon until there is enough segmental and suprasegmental information to identify the intended word accurately.When perceiving L2 words, not only words from the L2 but also from the L1 may be activated and compete for recognition.Thus, the fact that L1 does not have a phonemic distinction may result in the activation of words (competitor words) that are not present in the signal, making L2 word recognition less efficient.
Another possibility is that these participants were more sensitive to the written trigger, as they were more accurate whenever they saw a word with a <v>.Interestingly, this pattern was only marginal among the learners, but significant among the native speakers.While still showing high accuracies in all the conditions, native speakers were also more accurate whenever they heard a word with the labiodental sound, but saw a word containing the <b> letter.This interpretation of the results, then, would be in line with previous studies, which have documented the influence of orthography on the pronunciation of [v] (Cartagena, 2002;Stevens, 2000;Takawaki, 2012;Torres Cacoullos, Ferreira, 2000;Trovato, 2017).
In summary, the phonemic inventory of the L1 constrains speech discrimination and the consequent lexical competition they induce (modulated by visual-orthographic information) seems to explain the pattern of results reported in Experiment 2.

General discussion and conclusion
As mentioned before, this study was designed to determine how potential perception issues may affect word recognition in an L2, and how this pattern could be linked to the visual influence of orthography.The results of the study indicate that native speakers of Spanish do indeed show a problem identifying the difference between [b] and [v].This set of results has already been reported in the literature and the findings have been explained based on the properties of the phonetic inventory of the L1, such that an allophonic contrast in the L1 is not easily identifiable as distinctive phonemes in the L2 (e.g., Abramson,Lisker, 1970;Goto, 1971;Lisker, Abramson, 1967;Polka, 1992;Polka, Werker, 1994;Strange, Akahane-Yamada, Kubo, Trent, Nishi, 2001;Veleva, 1985;Werker, Lalonde, 1988;Werker, Logan, 1985;Werker, Tees, 1984).However, this study goes beyond these findings to determine that these identification problems do hinder word recognition and the results of Experiment 2 can be linked as well to the orthographic properties of English as proposed in the previous discussions.The influence of orthography on the pronunciation of the phoneme [v] has already been reported (Cartagena, 2002;Stevens, 2000;Takawaki, 2012;Torres Cacoullos, Ferreira, 2000;Trovato, 2017) and this study shows that this effect does not only appear in pronunciation but also seems to matter in perception.
In current models of L2 phonetic perception, such as the SLM (Flege, 1995) and PAM (Best, 1995), the perceived similarity between sounds in one's L1 and sounds in the L2 significantly influences the predicted difficulty of different non-native sound distinctions.Other listener-related factors, like the age of acquisition of the L2 and the continued use of the L1, also play a substantial role in determining the performance attained in the L2 (e.g., Flege et al., 1997Flege et al., , 2003)).The basis for perceiving similarity varies across these models.Best's PAM model relies on the similarity of articulatory gestures, while Flege's SLM model requires empirical measurement because it cannot be predicted in advance.While this study was not created with the purpose of teasing apart these two models of speech perception, the results do support the claims that the phonetic inventory of the L1 influences the perception of the phonetic inventory in the L2.Future studies on this specific parameter should focus on trying to discriminate between the two models to better understand how the phonemic inventory of the L1 influences the perception of the L2.This study had two main objectives.Firstly, it aimed to investigate the perceptual issues related to allophonic variation in the L1 and its impact on the correct discrimination of minimal pairs in the L2.Secondly, it sought to examine how these perceptual problems affected word recognition in the L2.As of now, this study appears to be one of the first to explore the negative influence of allophonic variation in the L1 on the accurate differentiation of minimal pairs in the L2.
Additionally, just as acoustic prominence can influence the ease of acquiring a sound contrast (see Best, McRoberts, 2003), the visual salience of the contrast is also likely to affect the extent to which learners pay attention to visual cues.Several studies before have a look at how visual information influences auditory perception in an L2.Combining written alphabetic input with auditory input enhances speech perception for different groups of listeners.This has been observed in studies involving individuals initially exposed to the sounds of an unfamiliar language (Erdener, Burnham, 2005) and L2 learners encountering pseudowords (Yazawa et al., 2020).However, if the target contrast is particularly challenging for the listener, this facilitative effect may not be as pronounced (Simon et al., 2010).In this study, both groups showed higher accuracy in identifying sounds when the acoustic information contained a labiodental sound while visually, they saw the letter "b".This is likely because when they saw the letter <b>, both groups could create a clear mental representation of how this letter should be pronounced, as it is phonemic in both languages.However, when the auditory information did not fully match their mental representation, it was easier for them to recognize that the heard word was not visually present.
In this study, although proficiency did not show a statistically significant influence on the results as previous studies found (e.g., Takawaki, 2012;Tim, 1976;Torres Cacoullos, Ferreria, 2000), it is important to interpret these findings cautiously.The results imply that the impact of L1 phonemic inventory on L2 speech perception might persist across different proficiency levels, even at advanced stages.However, there could be other plausible explanations.For instance, the proficiency (cloze) test used to assess participants' language skills may not have adequately measured their aural proficiency.Since the cloze test relies on the visual modality, it might not have been sensitive enough to capture the variations in L2 learners' auditory perceptual abilities, thereby potentially masking any proficiency-related effects.Future studies should examine the extent to which native-like production, perception, and recognition of /v/ vs. /b/ phonemic contrast is acquirable for Spanish L2 learners whose proficiency is assessed aurally.
There are seven bats in the attic.
There are seven vats in the attic.
This machine was invented to bend things.This machine was invented to vend things.
Please use "bended" in a sentence.Please use "vended" in a sentence.This is a bending machine.This is a vending machine.
At his job, the employee bends things.At his job, the employee vends things.
I gave her my best.I gave her my vest.
This bet has become rather famous.This vet has become rather famous.
I wore a biking costume.I wore a viking costume.
We often boat in the summer.We often vote in the summer.
I have always liked boating.I have always liked voting.
It is difficult to measure one bolt.It is difficult to measure one volt.
The peaseants will bow to those in power.The peasants will vow to those in power.
"Bowel" is the last word he looked up."Vowel" is the last word he looked up.This is a berry flavored pie.This is a very flavored pie.
The knight bowed to the king.The knight vowed to the king.
That curb is treacherous!That curve is treacherous!Mary always trips on the curbed sidewalk.Mary always trips on the curved sidewalk.
Anne believes that curbing makes the street more dangerous.
Anne believes that curving makes the street more dangerous.
There are several curbs along the way.
There are several curves along the way.
I won't listen to you dribble any longer.I won't listen to you drivel any longer.
The password is "dub".The password is "dove".
These dubs are worth a lot of money.These doves are worth a lot of money.
The woman found a fibre in her bag.
The woman found a fiver in her bag.
Please say the word "fibres" as carefully as you can.
Please say the word "fivers" as carefully as you can.
The sound of the gabble was overwhelming.
The sound of the gavel was overwhelming.
The lobes were divided into smaller pieces.The loaves were divided into smaller pieces.This is quite the marble!This is quite the marvel!
The cake was prepared for the birthday.
The marvelled cake was prepared for the birthday.
The collector keeps her marbles in a temperature-controlled room.
The collector keeps her marvels in a temperature-controlled room.
The king was not impressed with the rebel.The king was not impressed with the revel.
Robin Hood was pleased with the rebels in the crowd.
Robin Hood was pleased with the revels in the crowd.
I will say the word "robe" now.I will say the word "rove" now.
The child missed "robes" on the spelling test.The child missed "roves" on the spelling test.
I never know how to use "robing" in a sentence.
I never know how to use "roving" in a sentence.

Figure 1 :
Figure 1: Mean accuracy (standard errors) of the two groups in the AXB task of the stimuli was shaped to conform to the allophonic nature of the [b] vs.

Figure 2 :
Figure 2: Mean accuracy (standard deviation) of the three L1 groups in the word monitoring task

Table 1 :
Logit regression model on all participants' accuracy results

Table 2 :
Table 2 illustrates these four conditions with an example.Example of stimuli used in the word monitoring task

Table 3 :
Logit regression model on all participants' accuracy results

Table 4 :
Logit regression model on native speakers' accuracy results

Table 5 :
Logit regression model on Spanish L2 learners' accuracy results