Developing and validating a methodology for crowdsourcing L2 speech ratings in Amazon Mechanical Turk
Researchers have increasingly turned to Amazon Mechanical Turk (AMT) to crowdsource speech data, predominantly in English. Although AMT and similar platforms are well positioned to enhance the state of the art in L2 research, it is unclear if crowdsourced L2 speech ratings are reliable, particularly in languages other than English. The present study describes the development and deployment of an AMT task to crowdsource comprehensibility, fluency, and accentedness ratings for L2 Spanish speech samples. Fifty-four AMT workers who were native Spanish speakers from 11 countries participated in the ratings. Intraclass correlation coefficients were used to estimate group-level interrater reliability, and Rasch analyses were undertaken to examine individual differences in rater severity and fit. Excellent reliability was observed for the comprehensibility and fluency ratings, but indices were slightly lower for accentedness, leading to recommendations to improve the task for future data collection.
This accept article is published as Nagle, C.L.V., Developing and validating a methodology for crowdsourcing L2 speech ratings in Amazon Mechanical Turk. Journal of Second Language Pronunciation. 2019. DOI: 10.1075/jslp.18016.nag. Posted with permission.