Integrating genotype and weather variables for soybean yield prediction using deep learning
Realized performance of complex traits is dependent on both genetic and environmental factors, which can be difficult to dissect due to the requirement for multiple replications of many genotypes in diverse environmental conditions. To mediate these problems, we present a machine learning framework in soybean (Glycine max (L.) Merr.) to analyze historical performance records from Uniform Soybean Tests (UST) in North America, with an aim to dissect and predict genotype response in multiple envrionments leveraging pedigree and genomic relatedness measures along with weekly weather parameters. The ML framework of Long Short Term Memory - Recurrent Neural Networks works by isolating key weather events and genetic interactions which affect yield, seed oil, seed protein and maturity enabling prediction of genotypic responses in unseen environments. This approach presents an exciting avenue for genotype x environment studies and enables prediction based systems. Our approaches can be applied in plant breeding programs with multi-environment and multi-genotype data, to identify superior genotypes through selection for commercial release as well as for determining ideal locations for efficient performance testing.
This is a pre-print made available through bioRxiv, doi: 10.1101/331561.