Chukharev-Hudilainen, Evgeny

Profile Picture
Email Address
evgeny@iastate.edu
Birth Date
Research Projects
Organizational Units
Organizational Unit
Job Title
Last Name
Chukharev-Hudilainen
First Name
Evgeny

Search Results

Now showing 1 - 10 of 16
No Thumbnail Available
Publication

A Process-oriented Dataset of Revisions during Writing

2020-01-01 , Conijn, Rianne , Chukharev-Hudilainen, Evgeny , Dux Speltz, Emily , van Zaanen, Menno , Van Waes, Luuk , Chukharev-Hudilainen, Evgeny , English

Revision plays a major role in writing and the analysis of writing processes. Revisions can be analyzed using a product-oriented approach (focusing on a finished product, the text that has been produced) or a process-oriented approach (focusing on the process that the writer followed to generate this product). Although several language resources exist for the product-oriented approach to revisions, there are hardly any resources available yet for an in-depth analysis of the process of revisions. Therefore, we provide an extensive dataset on revisions made during writing (accessible via hdl.handle.net/10411/VBDYGX). This dataset is based on keystroke data and eye tracking data of 65 students from a variety of backgrounds (undergraduate and graduate English as a first language and English as a second language students) and a variety of tasks (argumentative text and academic abstract). In total, 7,120 revisions were identified in the dataset. For each revision, 18 features have been manually annotated and 31 features have been automatically extracted. As a case study, we show two potential use cases of the dataset. In addition, future uses of the dataset are described.

No Thumbnail Available
Publication

Golden Speaker Builder - An interactive tool for pronunciation training

2019-11-01 , Levis, John , Liberatore, Christopher , Sonsaat, Sinem , Chukharev-Hudilainen, Evgeny , Silpachai, Alif , Zhao, Guanlong , Gutierrez-Osuna, Ricardo , English

The type of voice model used in Computer Assisted Pronunciation Instruction is a crucial factor in the quality of practice and the amount of uptake by language learners. As an example, prior research indicates that second-language learners are more likely to succeed when they imitate a speaker with a voice similar to their own, a so-called “golden speaker”. This manuscript presents Golden Speaker Builder (GSB), a tool that allows learners to generate a personalized “golden-speaker” voice: one that mirrors their own voice but with a native accent. We describe the overall system design, including the web application with its user interface, and the underlying speech analysis/synthesis algorithms. Next, we present results from a series of listening tests, which show that GSB is capable of synthesizing such golden-speaker voices. Finally, we present results from a user study in a language-instruction setting, which show that practising with GSB leads to improved fluency and comprehensibility. We suggest reasons for why learners improved as they did and recommendations for the next iteration of the training.

No Thumbnail Available
Publication

Timed written picture naming in 14 European languages

2018-04-01 , Torrance, Mark , Chukharev-Hudilainen, Evgeny , Nottbusch, Guido , Alves, Rui , Arfé, Barbara , Chanquoy, Lucile , Chukharev-Hudilainen, Evgeny , Dimakos, Ioannis , Fidalgo, Raquel , Hyönä, Jukka , Jóhannesson, Ómar , Madjarov, George , Pauly, Dennis , Uppstad, Per Henning , van Waes, Luuk , Vernon, Michael , Wengelin, Åsa , English

We describe the Multilanguage Written Picture Naming Dataset. This gives trial-level data and time and agreement norms for written naming of the 260 pictures of everyday objects that compose the colorized Snodgrass and Vanderwart picture set (Rossion & Pourtois in Perception, 33, 217–236, 2004). Adult participants gave keyboarded responses in their first language under controlled experimental conditions (N = 1,274, with subsamples responding in Bulgarian, Dutch, English, Finnish, French, German, Greek, Icelandic, Italian, Norwegian, Portuguese, Russian, Spanish, and Swedish). We measured the time to initiate a response (RT) and interkeypress intervals, and calculated measures of name and spelling agreement. There was a tendency across all languages for quicker RTs to pictures with higher familiarity, image agreement, and name frequency, and with higher name agreement. Effects of spelling agreement and effects on output rates after writing onset were present in some, but not all, languages. Written naming therefore shows name retrieval effects that are similar to those found in speech, but our findings suggest the need for cross-language comparisons as we seek to understand the orthographic retrieval and/or assembly processes that are specific to written output.

No Thumbnail Available
Publication

L2-ARCTIC: A Non-Native English Speech Corpus

2018-01-01 , Zhao, Guanlong , Chukharev-Hudilainen, Evgeny , Sonsaat, Sinem , Silpachai, Alif , Lucic, Ivana , Gutierrez-Osuna, Ricardo , Levis, John , English

In this paper, we introduce L2-ARCTIC, a speech corpus of non-native English that is intended for research in voice conversion, accent conversion, and mispronunciation detection. This initial release includes recordings from ten non-native speakers of English whose first languages (L1s) are Hindi, Korean, Mandarin, Spanish, and Arabic, each L1 containing recordings from one male and one female speaker. Each speaker recorded approximately one hour of read speech from the Carnegie Mellon University ARCTIC prompts, from which we generated orthographic and forced-aligned phonetic transcriptions. In addition, we manually annotated 150 utterances per speaker to identify three types of mispronunciation errors: substitutions, deletions, and additions, making it a valuable resource not only for research in voice conversion and accent conversion but also in computer-assisted pronunciation training. The corpus is publicly accessible at https://psi.engr.tamu.edu/l2-arctic-corpus/.

No Thumbnail Available
Publication

Understanding the Effect of Voice Quality and Accent on Talker Similarity

2020-01-01 , Das, Anurag , Zhao, Guanlong , Levis, John , Chukharev-Hudilainen, Evgeny , Gutierrez-Osuna, Ricardo , English

This paper presents a methodology to study the role of nonnative accents on talker recognition by humans. The methodology combines a state-of-the-art accent-conversion system to resynthesize the voice of a speaker with a different accent of her/his own, and a protocol for perceptual listening tests to measure the relative contribution of accent and voice quality on speaker similarity. Using a corpus of non-native and native speakers, we generated accent conversions in two different directions: non-native speakers with native accents, and native speakers with non-native accents. Then, we asked listeners to rate the similarity between 50 pairs of real or synthesized speakers. Using a linear mixed effects model, we find that (for our corpus) the effect of voice quality is five times as large as that of non-native accent, and that the effect goes away when speakers share the same (native) accent. We discuss the potential significance of this work in earwitness identification and sociophonetics.

No Thumbnail Available
Publication

Requirement Text Detection from Contract Packages to Support Project Definition Determination

2019-01-01 , Chukharev-Hudilainen, Evgeny , Le, Chau , Jeong, H. David , Gilbert, Stephen , Gilbert, Stephen , Virtual Reality Applications Center , Psychology , English , Industrial and Manufacturing Systems Engineering , Human Computer Interaction , Virtual Reality Applications Center

Project requirements are wishes and expectations of the client toward the design, construction, and other project management processes. The project definition is typically specified in a contract package including a contract document and many other related documents such as drawings, specifications, and government codes. Project definition determination is critical to the success of a project. Due to the lack of efficient tools for requirement processing, the current practices regarding project scoping still heavily rely on a manual basis which is tedious, time-consuming, and error-prone. This study aims to fill that gap by developing an automated method for identifying requirement texts from contractual documents. The study employed Naïve Bayes to train a classification model that can be used to separate requirement statements from non-requirement statements. An experiment was conducted on a manually labeled dataset of 1191 statements. The results revealed that the developed requirement detection model achieves a promising accuracy of over 90%.

No Thumbnail Available
Publication

Parsing Natural Language Queries for Extracting Data from Large-Scale Geospatial Transportation Asset Repositories

2018-03-29 , Chukharev-Hudilainen, Evgeny , Jeong, H. David , Gilbert, Stephen , Chukharev-Hudilainen, Evgeny , Gilbert, Stephen , Virtual Reality Applications Center , Psychology , English , Industrial and Manufacturing Systems Engineering , Human Computer Interaction , Virtual Reality Applications Center

Recent advances in data and information technologies have enabled extensive digital datasets to be available to decision makers throughout the life cycle of a transportation project. However, most of these data are not yet fully reused due to the challenging and time-consuming process of extracting the desired data for a specific purpose. Digital datasets are presented only in computer-readable formats and they are mostly complicated. Extracting data from complex and large data sources is significantly time-consuming and requires considerable expertise. Thus, there is a need for a user-friendly data exploration framework that allows users to present their data interests in human language. To fulfill that demand, this study employs natural language processing (NLP) techniques to develop a natural language interface (NLI) which can understand users’ intent and automatically convert their inputs in the human language into formal queries. This paper presents the results of an important task of the development of such a NLI that is to establish a method for classifying the tokens of an ad-hoc query in accordance with their semantic contribution to the corresponding formal query. The method was validated on a small test set of 30 plain English questions manually annotated by an expert. The result shows an impressive accuracy of over 95%. The token classification presented in this paper is expected to provide a fundamental means for developing an effective NLI to transportation asset databases.

No Thumbnail Available
Publication

Generating partial civil information model views using a semantic information retrieval approach

2020-01-01 , Chukharev-Hudilainen, Evgeny , Jeong, H. David , Gilbert, Stephen , Chukharev-Hudilainen, Evgeny , Gilbert, Stephen , Virtual Reality Applications Center , Psychology , English , Industrial and Manufacturing Systems Engineering , Psychology , Gerontology

Open data standards (e.g. LandXML, TransXML, CityGML) are a key to addressing the interoperability issue in exchanging civil information modeling (CIM) data throughout the project life-cycle. Since these schemas include rich sets of data types covering a wide range of assets and disciplines, model view definitions (MVDs) which define subsets of a schema are required to specify what types of data to be shared in accordance with a specific exchange scenario. The traditional procedure for generating and implementing MVDs is time-consuming and laborious as entities and attributes relevant to a particular data exchange context are manually identified by domain experts. This paper presents a method that can locate relevant information from a source XML data schema for a specific domain based on the user's keyword. The study employs a semantic resource of civil engineering terms to understand the semantics of a keyword-based query. The study also introduces a novel context-based search technique for retrieving related entities and their referenced objects. The developed method was tested on a gold standard of several LandXML subschemas. The experiment results show that the semantic MVD retrieval algorithm achieves a mean average precision of nearly 90%. The research is original, being a novel method for extracting partial civil information models given a keyword from the end user. The method is expected to become a fundamental tool assisting professionals in extracting data from complex digital datasets.

No Thumbnail Available
Publication

Exploring the potential of process-tracing technologies to support assessment for learning of L2 writing

2018-04-01 , Ranalli, Jim , Chukharev-Hudilainen, Evgeny , Feng, Hui-Hsien , Chukharev-Hudilainen, Evgeny , Ranalli, Jim , English

Assessment for learning (AfL) seeks to support instruction by providing information about students’ current state of learning, the desired end state of learning, and ways to close the gap. AfL of second-language (L2) writing faces challenges insofar as feedback from instructors tends to focus on written products while neglecting most of the processes that gave rise to them, such as planning, formulation, and evaluation. Meanwhile, researchers studying writing processes have been using keystroke logging (KL) and eye-tracking (ET) to analyze and visualize process engagement. This study explores whether such technologies can support more meaningful AfL of L2 writing. Two Chinese L1 students studying at a U.S. university who served as case studies completed a series of argumentative writing tasks while a KL-ET system traced their processes and then produced visualizations that were used for individualized tutoring. Data sources included the visualizations, tutoring-session transcripts, the participants’ assessed final essays, and written reflections. Findings showed the technologies, in combination with the assessment dialogues they facilitated, made it possible to (1) position the participants in relation to developmental models of writing; (2) identify and address problems with planning, formulation, and revision; and (3) reveal deep-seated motivational issues that constrained the participants’ learning.

No Thumbnail Available
Publication

A system for adaptive high-variability segmental perceptual training: Implementation, effectiveness, transfer

2018-02-01 , Qian, Manman , Chukharev-Hudilainen, Evgeny , Levis, John , English

Many types of L2 phonological perception are often difficult to acquire without instruction. These difficulties with perception may also be related to intelligibility in production. Instruction on perception contrasts is more likely to be successful with the use of phonetically variable input made available through computer-assisted pronunciation training. However, few computer-assisted programs have demonstrated flexibility in diagnosing and treating individual learner problems or have made effective use of linguistic resources such as corpora for creating training materials. This study introduces a system for segmental perceptual training that uses a computational approach to perception utilizing corpusbased word frequency lists, high variability phonetic input, and text-to-speech technology to automatically create discrimination and identification perception exercises customized for individual learners. The effectiveness of the system is evaluated in an experiment with pre- and post-test design, involving 32 adult Russian-speaking learners of English as a foreign language. The participants’ perceptual gains were found to transfer to novel voices, but not to untrained words. Potential factors underlying the absence of word-level transfer are discussed. The results of the training model provide an example for replication in language teaching and research settings.