Genomic Selection with Deep Neural Networks

Mcdowell, Riley
Journal Title
Journal ISSN
Volume Title
Source URI
Research Projects
Organizational Units
Organizational Unit
Journal Issue

Reduced costs for DNA marker technology has generated a huge amount of molecular

data and made it economically feasible to generate dense genome-wide marker maps of lines

in a breeding program. Increased data density and volume has driven an exploration of

tools and techniques to analyze these data for cultivar improvement. Data science theory

and application has experienced a resurgence of research into techniques to detect or ”learn”

patterns in noisy data in a variety of technical applications. Several variants of machine

learning have been proposed for analyzing large DNA marker data sets to aid in pheno-

type prediction and genomic selection. Here, we present a review of the genomic prediction

and machine learning literature. We apply deep learning techniques from machine learn-

ing research to six phenotypic prediction tasks using published reference datasets. Because

regularization frequently improves neural network prediction accuracy, we included regular-

ization methods in the neural network models. The neural network models are compared to

a selection of regularized Bayesian and linear regression techniques commonly employed for

phenotypic prediction and genomic selection. On three of the phenotype prediction tasks,

regularized neural networks were the most accurate of the models evaluated. Surprisingly,

for these data sets the depth of the network architecture did not affect the accuracy of the

trained model. We also find that concerns about the computer processing time needed to

train neural network models to perform well in genomic prediction tasks may not apply when

Graphics Processing Units are used for model training.

Data Science, Genomic Selection, Neural Networks, Plant Breeding