Machine learning analytics for predictive breeding
Prediction accuracies of genomic selection methods are affected by the quality of the phenotypic and genotypic data and by the use of appropriate analytic models in the training sets. This research focuses on the impact of data quality for ordinal traits. Ordinal scores of traits are typical for various types of stress tolerance and resistance. Established spatial models developed for continuous quantitative traits were unknown whether they can effectively adjust the spatial autocorrelation for ordinal traits with sharp transitions patterns among groups of plots in experimental field trials. The effectiveness of the spatial adjustments was systematically compared with eight different spatial models using soybean iron deficiency chlorosis (IDC) as an example. After incorporation of the spatial pattern recognition to provide adjusted ordinal data, a comparison of prediction accuracies between algorithmic modeling and data modeling approaches were systematically conducted. The results revealed that genomic prediction accuracies could be dramatically improved by both machine learning models and geospatial spatial analyses. Overall, algorithmic modeling outperforms data modeling methods for the soybean IDC ordinal data type. Further, machine learning algorithms provide higher prediction accuracy than traditional statistical data models in terms of sensitivity, specificity, and overall accuracy.