Developing and validating a methodology for crowdsourcing L2 speech ratings in Amazon Mechanical Turk

Nagle, Charles

Developing and validating a methodology for crowdsourcing L2 speech ratings in Amazon Mechanical Turk

File

NagleCL_manu_Developing_and_validating_a_methodology_for_crowdsourcing_L2_speech.pdf (307.22 KB)

Date

2019-01-21

Authors

Nagle, Charles

Abstract

Researchers have increasingly turned to Amazon Mechanical Turk (AMT) to crowdsource speech data, predominantly in English. Although AMT and similar platforms are well positioned to enhance the state of the art in L2 research, it is unclear if crowdsourced L2 speech ratings are reliable, particularly in languages other than English. The present study describes the development and deployment of an AMT task to crowdsource comprehensibility, fluency, and accentedness ratings for L2 Spanish speech samples. Fifty-four AMT workers who were native Spanish speakers from 11 countries participated in the ratings. Intraclass correlation coefficients were used to estimate group-level interrater reliability, and Rasch analyses were undertaken to examine individual differences in rater severity and fit. Excellent reliability was observed for the comprehensibility and fluency ratings, but indices were slightly lower for accentedness, leading to recommendations to improve the task for future data collection.

Academic or Administrative Unit

World Languages and Cultures

Type

article

Comments

This accept article is published as Nagle, C.L.V., Developing and validating a methodology for crowdsourcing L2 speech ratings in Amazon Mechanical Turk. Journal of Second Language Pronunciation. 2019. DOI: 10.1075/jslp.18016.nag. Posted with permission.

Copyright

Tue Jan 01 00:00:00 UTC 2019