Evaluation of Parametric and Nonparametric Statistical Methods in Genomic Prediction

dc.contributor.advisor Alicia Carriquiry
dc.contributor.advisor William Beavis
dc.contributor.author Howard, Reka
dc.contributor.department Statistics
dc.date 2018-08-11T13:04:24.000
dc.date.accessioned 2020-06-30T03:05:57Z
dc.date.available 2020-06-30T03:05:57Z
dc.date.copyright Fri Jan 01 00:00:00 UTC 2016
dc.date.embargo 2001-01-01
dc.date.issued 2016-01-01
dc.description.abstract <p>The availability of high-density markers resulted an increased interest in the use of markers for phenotype prediction in plant breeding. Genomic Prediction is a technique that uses marker and phenotypic information of individuals to build a model that enables plant breeders to predict the phenotypic value of individuals with only genotypic scores. In recent years there have been a large number of parametric and nonparametric statistical methods developed for purposes of genomic prediction.</p> <p>Initially we review parametric methods including Least Squares Regression, Ridge Regression, Bayesian Ridge Regression, Least Absolute Shrinkage and Selection Operator (LASSO), Bayesian LASSO, best linear unbiased prediction (BLUP), Bayes A, Bayes B, Bayes C, and Bayes Cpi, and nonparametric methods including Nadaraya-Watson Estimator, Reproducing Kernel Hilbert Space, Support Vector Machine Regression, and Neural Networks. We also contrast the methods based on accuracy and mean squared error (MSE) using simulated genetic architectures consisting of completely additive or two-way epistatic interactions in populations derived from crosses of inbred lines where the genetic architecture contributes low (0.3) and high (0.7) proportions of the total simulated phenotypic variability.</p> <p>Based on these preliminary results we introduce Response Surface Methodology (RSM) as a systematic strategy for investigating Genomic Prediction methods as an efficient approach to investigating a wide range of the design variables. We illustrate RSM with a simulated example where the response we optimize is the difference between prediction accuracies of a parametric method and a nonparametric method. We examine how the number of individuals, markers, QTL, and different percentage of epistasis and heritability maximize the estimated differences in accuracies. We found the the greatest impact on estimates of accuracy and MSE was due to genetic architecture of the population and the heritability of the trait. When epistasis and heritability are highest, the advantage of using a nonparametric method versus a parametric prediction method is greatest.</p> <p>Finally, we simulate data for a structured population consisting of multiple families parental generation`s phenotypic and genotypic information to predict the progeny`s phenotypes. Simulations utilized high density molecular genotypic scores from a sample of soybean varieties adapted to maturity zone 3 to establish the structured breeding population. In the simulation we consider low and high heritability, two different genetic architectures, and the training data contain either all of the parents or only a subset of the parents with the highest phenotypic values. We define a different metric to evaluate genomic prediction techniques, where we compare simulated progeny having the highest phenotypic values with predicted progeny having the highest phenotypic values based on their parental phenotypic and genotypic values. We found that if the genetic architecture is additive then the parametric and nonparametric methods perform similarly according to the new metric. When epistasis is present, the nonparametric method had a higher percentage of identical parents than the parametric method.</p>
dc.format.mimetype application/pdf
dc.identifier archive/lib.dr.iastate.edu/etd/15720/
dc.identifier.articleid 6727
dc.identifier.contextkey 11165089
dc.identifier.doi https://doi.org/10.31274/etd-180810-5348
dc.identifier.s3bucket isulib-bepress-aws-west
dc.identifier.submissionpath etd/15720
dc.identifier.uri https://dr.lib.iastate.edu/handle/20.500.12876/29903
dc.language.iso en
dc.source.bitstream archive/lib.dr.iastate.edu/etd/15720/Howard_iastate_0097E_15898.pdf|||Fri Jan 14 20:45:37 UTC 2022
dc.subject.disciplines Agriculture
dc.subject.disciplines Plant Sciences
dc.subject.disciplines Statistics and Probability
dc.subject.keywords Genomic Prediction
dc.subject.keywords Nonparametric
dc.subject.keywords Parametric
dc.subject.keywords Response Surface Methodology
dc.subject.keywords Simulation
dc.title Evaluation of Parametric and Nonparametric Statistical Methods in Genomic Prediction
dc.type article
dc.type.genre dissertation
dspace.entity.type Publication
relation.isOrgUnitOfPublication 264904d9-9e66-4169-8e11-034e537ddbca
thesis.degree.discipline Statistics; Plant Breeding
thesis.degree.level dissertation
thesis.degree.name Doctor of Philosophy
File
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Howard_iastate_0097E_15898.pdf
Size:
4.17 MB
Format:
Adobe Portable Document Format
Description: