Topics in genomic selection and matrix completion

No Thumbnail Available
Zhang, Zerui
Major Professor
Wang, Lizhi
Qiu, Yumou
Liu, Peng
Beavis, William
Yu, Jianming
Committee Member
Journal Title
Journal ISSN
Volume Title
Research Projects
Organizational Units
Journal Issue
Is Version Of
Industrial and Manufacturing Systems Engineering
This dissertation is composed of three research projects that share overlapped interests in plant breeding. In particular, it focuses on the topics on genomic selection, matrix completion, and hybrid breeding. The first project deals with improvement of genomic selection using operations research. We developed a novel selection algorithm that performs a heuristic search for selection and mating of plant individuals under linear additive genomic effects, recombination rates, and a fixed budget for crossing times. To better weigh short- and long-term genetic gains and leverage the time value during breeding process, we adopted the concept of ``present value" from finance and proposed a weighted average of the expected gametes' genomic estimated breeding values (GEBV) within a specified time window as the objective function. The uncertainties of future gametes are simulated by using look-ahead technique to approximate the open pollination on a group of individuals, and parameters including discount rate, window length, and percentile are introduced to flexibly adjust breeders' interests. The simulation showed the present value-based look-ahead search outperformed conventional genomic selection and its state-of-the-art predecessor look-ahead search by increasing more genetic gains with moderate compromise of genetic diversity loss. In the second project, we aim to construct a simulation platform to carry out more fair and realistic comparisons for different hybrid breeding methods. Not only do we fill the gap that hybrid breeding, a breeding approach that focuses more on the dominant effects of heterozygous offspring in addition to additive effects, does not yet have a computational workflow, we also introduce the importance of designing an opaque simulator in the evaluation of selection algorithms. Compared to traditional transparent simulators, an opaque simulator provides partial genomic information which is barely adequate for imperfect genomic prediction; and simulates uncertain recombination events more realistically. The inter-combinations between simulators with two levels of opacity and prediction methods with different accuracies, empirical selection and mating strategies highlight the discrepancies in hybrid breeding performance due to different settings. This motivates us to design robust selection algorithms that are better suited to complex natural environments. The third project originates from an application to fill in missing values in the test cross between many inbred lines and test lines. We view it as a matrix completion/recommender system problem where rows and columns represent indices for user and item and each entry within the matrix represents the preference rating. Inspired by matrix factorization strategy and neighborhood based strategy, we propose a latent feature model for the preference rating and use a kernel regression based estimator. We define the radial neighborhood set for a specific user-item pair to pool either direct or indirect information and approximate the distance between latent feature vectors by row-wise and column-wise $L_2$ norm. Simulations and empirical studies showed the new estimator performed better generally. The theorem of consistency of the estimator and convergence of the distance measures are also presented.