An adaptive computational system for automated, learner-customized segmental perception training in words and sentences: Design, implementation, assessment

Thumbnail Image
Qian, Manman
Major Professor
John M. Levis
Committee Member
Journal Title
Journal ISSN
Volume Title
Research Projects
Organizational Units
Organizational Unit

The Department of English seeks to provide all university students with the skills of effective communication and critical thinking, as well as imparting knowledge of literature, creative writing, linguistics, speech and technical communication to students within and outside of the department.

The Department of English and Speech was formed in 1939 from the merger of the Department of English and the Department of Public Speaking. In 1971 its name changed to the Department of English.

Dates of Existence

Historical Names

  • Department of English and Speech (1939-1971)

Related Units

Journal Issue
Is Version Of

Segmental perception training is important as many phonemic errors are common in second language pronunciation and the perception of foreign phonemic contrasts is often difficult to acquire without instruction (Best & Tyler, 2007; Birdsong, 1992, 2006; Flege, 1988, 1995). Numerous computer-assisted programs exist that provide training for segmental perception, but few of them have made effective use of already-existing language resources. There has been a call for the creation of a computer-assisted pronunciation teaching (CAPT) program that provides individualized needs-based training based on first language and learner proficiency (Levis, 2007; Munro, Derwing, & Thomson, 2015). A perception training model is yet to be developed that takes into account the major components important to intelligibility, the use of technology, and the state-of-the-art research findings on perception training. Specifically, the ideal training model first needs to account for learners’ L1 backgrounds since L2 segmental errors are often L1-specific (Swan & Smith, 2002). Second, the training model should also be tailored to individual needs as not everyone sharing the same L1 will certainly have the L1-predicted errors (Munro, 2018; Munro, Derwing, & Thomson, 2015). Third, the functional load theory (King, 1967) suggests that not all phonemic errors affect intelligibility equally and that perception training should not target all errors as if they had an equal impact on intelligibility. Fourth, the training model should leverage a high-variability phonetic training design, defined as a technique of using multiple voice models for perception training (Pisoni & Lively, 1995), which has been found to be efficacious in improving perception (Thomson, 2012; Wang & Munro, 2004) as well as production (Thomson, 2011).

This study introduces an innovative online perception training system that uses computational approaches to deliver high variability phonetic training designed to improve learners’ ability to discriminate and identify segmental contrasts. The system was designed with five major features. First, the system was developed with intelligibility-driven goals by only focusing on high functional load segmental errors. Second, the system offered training customized to individual learners’ pre-training diagnostic performance and then adapted the training content and intensity based on individual learners’ errors during real-time learning. Third, in recognition of the efficacy of multi-voice models for perception acquisition (Thomson, 2011, 2012; Wang & Munro, 2004), the system utilized high-variability phonetic training exercises developed using two North American text-to-speech voices. Fourth, the training system was self-contained and could be accessed and used by learners flexibly and independently based on their own pace with little teacher guidance. Fifth, immediate individualized feedback was available on every item during training. In addition, the stimuli used for the training system were automatically extracted from a phonetically transcribed dictionary with word frequency controlled. Specifically, only words among the top 5,000 lemmas in the Contemporary Corpus and American English were selected by the system to ensure that all the training and test stimuli were likely to be familiar to the participants in the study so that they would be able to recognize the stimuli aurally during perception tests and training without seeing the words spelled out.

Four types of exercises created with text-to-speech minimal pairs, automatically extracted from the Illinois Speech and Language Engineering Dictionary, were used for training. The training exercises came in four types: same-different discrimination, oddity discrimination, simple identification, and yes/no identification. The voices and words of the training stimuli were controlled for in order to examine the learners’ potential transfer of perception gains to three novel conditions: to trained words spoken with untrained voices, to untrained words spoken with trained voices, and to trained items in sentences. The training system was used for approximately three months by 266 Chinese-L1 English majors from three universities located in three cities (Harbin, Soochow, and Guangzhou). The learners were placed into either an experimental group or a control group based on their institution, and used the system for perception training on nine English consonant and vowel contrasts that were predicted to be challenging for the learners.

An analysis of the participants’ diagnostic and training performance revealed substantial variation among the learners’ actual segmental errors and pace of learning. This suggests that L2 phonemic acquisition is not merely L1-specific or dialect-specific but is a process distinctive to individual learners but that was not correlated with time on training, highlighting the importance of incorporating adaptability in the design and delivery of pronunciation training materials. Descriptive and inferential statistics on training effect, retention and transfer of test gains showed that an average of 143 minutes of focused effort led to robust improvement and retention of phonemic perception for most of the segmental contrasts under investigation. L2 segmental acquisition was sensitive to the linguistic context of a segment and the training in the study helped the learners transfer perception gains to untrained contexts (new voices, new words, and the untrained sentence contexts). The results showed that high-variability input materials and the text-to-speech technology can be effectively used to develop perception training materials. The study also showed that exercises designed to specifically sharpen aural sensitivity to contrasting phonemes may facilitate learners’ ability in self correcting phonemic issues even without explicit training on the issues. Findings in the study were discussed within the exemplar theory (Bybee, 2000), the analogical modeling theory (Skousen, 1989), the TRACE model within the connectionist framework (Joanisse & McClelland, 2015), the item versus system learning theory (Cruttenden, 1981), the U-shaped Learning Theory (Gass & Selinker, 2008), and the Speech Learning Model (Flege, 1995). Future research is encouraged to investigate the effect of adaptive perception training in improving learner response latency and productive performance that are essential to real life pronunciation and communication competence.

Sat Dec 01 00:00:00 UTC 2018