The role of test locations in early-stage plant breeding: Identifying discriminating locations and extrapolating performance to locations that are not observed
Okudan-Kremer, Gul Erdem
Nordman, Daniel John
Is Version Of
Industrial and Manufacturing Systems Engineering
It is well established that the phenotypic response of plants is based on a main genetic effect (G), an environmental effect (E), and a genetic-by-environment (GxE) interaction effects that are typically significant in magnitude. In commercial plant breeding, predicting the phenotypic response of new experimental genotypes is therefore especially challenging because it is only possible to plant, and hence observe, each genotype in a very limited set of locations, creating a possible bias because for a particular experimental genotype the average GxE effects for this set of environments may be significantly different than if it was observed over a larger set of environments. In other words, a predictive model based on the observed locations may either over- or underestimate the true performance because the small set of observed locations was either favorable or unfavorable for this specific genotype. If a large enough sample of locations could be planted and observed then this bias might be eliminated, but in practice that is not possible. Decisions regarding the advancement of a specific plant genotype is challenging due to observing them in a very limited number of observations. But if we can identify subsets of genotypes that perform similarly (i.e., have similar GxE) to the current commercial genotypes, we can expand our training data to a much larger set. This thesis addresses the issue of how to expand the training data so that machine learning can be later used to predict the phenotypic response better to aid better decision making. Also, in early-stage experimentation, where each genotype is observed in very few environments, even compared to late-stage experimentation, the focus is often on the genotype main effect rather than genetic-by-environment interaction effect. So, it adds value if we can plant these genotypes in locations that are best able to capture the true ranking of the genotypes. This thesis aims to shift the focus from improving precision of predictive models to improving probability of making correct genotype selection by identifying and using locations that are best able to discriminate between genotypes so that we can make correct advancement decisions.