Computational studies on recurrent genomic selection for genetic improvement of soybeans
Every crop genetic improvement project has unique objectives, although there are some general consistent objectives that are common among projects. Due to competition in the marketplace commercial plant breeders need to place greater emphasis on immediate genetic gains with potential loss of useful genetic variance. In contrast public plant breeders have the opportunity, perhaps obligation, to emphasize retention of useful genetic variance while improving genetic improvements. Most often the relative emphasis on these competing objectives has not been designed rather it has emerged as a consequence of reproductive biology, genetic architecture and budget constraints. The theme of the research reported in this dissertation is that the development of Genomic Selection (GS) methods has provided the ability to plan and execute the trade-offs between these competing objectives in genetic improvement projects. Some early simulation studies that compared recurrent GS with Phenotypic selection (PS) revealed greater short term genetic gains with GS, while PS resulted in better long term genetic gains because PS retains useful genetic variability during the early cycles of selection. These early studies also indicated different genetic architectures, heritabilities, GS methods, training sets and selection intensities would result in a range of responses across multiple cycles. We hypothesized that interactions among these factors could further increase the number of possible response curves. We decided to evaluate the hypothesis by simulating 40 cycles of recurrent selection for sets of founders with genotypic data and population structures based on the founders of the Soybean Nested Association Mapping panel. Ten simulations were conducted on a factorial set comprised of over 300 combinations of selection methods, training sets, and selection intensities, which are under the control of the breeder, as well as genetic architecture and heritability, which are not. To distinguish among the 300+ replicated response curves we employed a first order recurrence equation to model the genotypic responses. Because recurrence equations are discrete analogs of differential equations, the estimated parameters enabled evaluation of response rates, half-lives and genotypic values as responses approach asymptotic limits. By modeling genotypic responses it was also possible to conduct ANOVA of the non-linear responses, which revealed that both the rates of genetic improvement in the early cycles and limits to genetic improvement in the later cycles are significantly affected by interactions among all investigated factors. Even though all possible interactions significantly affected modeled responses, there were some consistent trends. Updating GP models with training sets consisting of data from prior cycles of selection significantly improved prediction accuracy and genetic response for all GS methods. From among the GS methods with updated training sets, selection on values estimated from Ridge Regression –Restricted Maximum Likelihood Method (RR-REML) resulted in better response rates and larger asymptotic limits than selection on estimates from BayesB and Bayes LASSO models. A Support Vector Machine with a radial basis kernel method resulted in the fastest loss of genetic variance in the early cycles. We next hypothesized that we could improve both response rates and retention of useful genetic variability in the simulated soybean populations by decomposing breeding strategies into decisions about selection methods and mating designs. For breeding populations organized into islands, decisions about possible migration rules among family islands were included. From among 60 possible strategies, genetic improvement is maximized for the first five to ten cycles using GS and a hub network mating design in breeding populations organized as fully connected family islands and migration rules allowing exchange of two lines among islands every other cycle of selection. If the objectives are to maximize both short-term and long-term gains, then the best compromise strategy is similar except a genomic mating design, instead of a hub networked mating design, is used. This strategy also resulted in realizing the greatest proportion of genetic potential of the founder populations. In Weighted Genomic Selection (WGS), the estimated marker effects are weighted by the inverse of the favorable allele frequency and the weighted values are used for selection. WGS, when applied to both non-isolated and island populations, also resulted in the realization of the greatest proportion of genetic potential of the founders, but required many more cycles than the strategy that showed the best compromise between short-term and long-term gain in the case of non-isolated populations. These studies have the potential to contribute to the development of decision support systems that use new approaches to integrate the strengths of whole-genome level information, prediction modeling, and optimization methods for long-term genetic improvement of crops.