OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification
Date
Authors
Major Professor
Advisor
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This paper describes the collection and compilation of the OneStopEnglish corpus of texts written at three reading levels, and demonstrates its usefulness for through two applications - automatic readability assessment and automatic text simplification. The corpus consists of 189 texts, each in three versions (567 in total). The corpus is now freely available under a CC by-SA 4.0 license1 and we hope that it would foster further research on the topics of readability assessment and text simplification.
Series Number
Journal Issue
Is Version Of
Versions
Series
Academic or Administrative Unit
Type
Comments
This proceeding is published as Vajjala, Sowmya, and Ivana Lucic. "OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification." In Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications (2018): 297-304.