Predicting soybean yield and seed composition by understanding physiological processes through modeling

Thumbnail Image
Date
2023-08
Authors
Chiozza, Mariana Victoria
Major Professor
Advisor
Miguez, Fernando E
Singh, Asheesh K
Vanloocke, Andrew
Goggi, Susana
Archontoulis, Sotirios
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Altmetrics
Abstract
Prediction of soybean [Glycine max L. (Merr.)] seed yield and seed composition at a plot scale before harvesting has potential uses in breeding programs for early-season selection and harvesting decisions. Reflectance information from hyperspectral bands have been mainly used for predicting yield and other crop variables. However, an analysis comparing the prediction accuracy among different crop variables such as LAI, above-ground biomass, seed yield and seed protein and oil, when using hyperspectral bands as predictors, is lacking. The first objective of my dissertation was to rank the prediction accuracy among different crop variables using hyperspectral bands captured at different timepoints during the growing season. The hypothesis is based on a physiological framework where crop variables that are closely associated with light interception (i.e., LAI) would be best predicted by the hyperspectral signal than variables that involve more physiological processes (i.e., biomass, seed yield and seed protein and oil) for their determination. The dataset used for testing this hypothesis involved different genotypes, environments, and management practices. Partial Least Squares regression with cross-validation was used to test the association between the observed variables and the hyperspectral bands. Results showed that LAI can be best predicted using reflectance information, and suggest that hyperspectral bands are necessary but not sufficient to improve the prediction of other crop variables such as biomass, seed yield, and seed composition traits. Based on the previous finding of the retrieval of a leaf area trait by means of high-throughput phenotyping (HTP), I evaluated the potential use of this leaf area trait as a secondary trait to breed for seed yield. Often, a linear relationship between a leaf area trait and seed yield is assumed, but evidence shows that other functions are possible, representing a challenge for applying successful indirect selection breeding efforts. Therefore, in the second part of this dissertation, the main objective was to improve our understanding of the nature of the relationship between soybean leaf area traits, such as canopy cover (CC) and leaf area index (LAI), and seed yield under different scenarios including genetic (G), environments (E) and managements (M). A major limitation to assess a relevant range of G x E x M space is the large experimental footprint required within the target area of interest. Therefore, I explored real case scenarios with field experiment data and expanded the experimental footprint by conducting an in-silico crop simulation experiment using APSIM Next Generation. Generalized additive models (GAMs) were used to fit data relating canopy traits and final seed yield without assuming any specific response function. Results showed that different types of responses are possible depending on the factors considered. Varying optimum leaf area values were observed either because there is a change in the shape of the response or because of a curve shift. I propose that the information required to build the leaf area-seed yield relationship could come from previous experiments or from empirical models built with a subset of the population under study. This work demonstrates that the use of leaf area for selection might not be as straightforward as expected but there are avenues for improved application in HTP if factors affecting this relationship are considered. Based on the results from previous chapter regarding nonlinear associations between secondary traits and final traits, along with the possibility for those relationships to be modify by diverse factors, further increasing the complexity of the response, and, in view of previous reports of nonlinear relationships among a response variable and different predictor variables, I propose a soybean seed composition prediction model using the random forest (RF) method. For that purpose, candidate RF predictive models varying in the number and identity of the predictors included were evaluated. The predictive ability of the candidate models was compared with two additional methods that are easier to interpret: linear models, and generalized additive models (GAM), which can account for nonlinear relationships and interactions. The data set used in the present study was extracted from field trials reported in the Uniform Soybean Test and spanned 21 US states over a period of 20 years. Results showed that a model including latitude, longitude, sowing and harvesting date and mean temperature during vegetative growth can predict seed protein and oil values with a RRMSE of 0.07% (R2 = 0.8) and 0.09% (R2 = 0.7), respectively. In addition, it was demonstrated that the RF method fit the data better and predict seed composition with less error than the other two methods. Therefore, I propose this method for seed composition predictions and recommend including genomic or image-based crop information for more precise predictions at a genotypic level.
Series Number
Journal Issue
Is Version Of
Versions
Series
Academic or Administrative Unit
Type
dissertation
Comments
Rights Statement
Copyright
Funding
Subject Categories
Supplemental Resources
Source