Genome-wide prediction of breeding values and mapping of quantitative trait loci in stratified and admixed populations
Ideally genome-wide association studies require homogenous samples originating from randomly mating populations with minimal pedigree relationship. However, in reality such samples are very hard to collect. Non-random mating combined with artificial selection has created complex pattern of population structure and relationship in commercial crop and livestock populations. This requires proper modeling of population structure and kinship a necessary step of all genome-wide association studies. Otherwise, the risk of both false-positives (declaring a marker as significant without it be linked to a QTL) and false-negatives (markers linked to a QTL declared as non-significant) increases dramatically.
In this thesis, we first applied genomic selection (GS) approach to develop equations for prediction of breeding values of purebred candidates based on a model trained on an admixed or crossbred population. In this approach all markers effects are treated as random and are fitted simultaneously. It was hypothesized that given a high-density marker data and using the GS approach; training in a crossbred or admixed population could be as accurate as training in a purebred population that is the target of selection. In a stochastic simulation study, it was shown that both crossbred and admixed populations could predict breeding values of a purebred population, without the need for explicitly modeling of breed composition and pedigree relationship. However, accuracy of GS was greatly reduced when genes from the target pure breed were not included in the admixed or crossbred training population. In addition, it was shown that the accuracy of GS depends on the genetic distance between the training and validation population, the closer the relationship between the two the higher was the prediction accuracy. Further, increasing of marker density improved the accuracy of prediction especially when a crossbred population has been used as the training dataset. Considering haplotypes with weak linkage disequilibrium (LD), the crossbreds showed extensive LD, whereas the LD in the purebreds was confined to smaller segments. In contrast, examination of the length of haplotypes with strong LD indicated that these haplotypes are much shorter in crossbreds than that in purebreds. Our results showed that in crossbred populations the number of haplotypes with strong LD is less than that in the purebred populations. The findings of this research suggested that the crossbred populations are more suitable for QTL fine mapping than the purebreds.
In addition, in another simulation study we compared power, false-positive rate, accuracy and positive predictive value of QTL mapping in an admixed population with and without modeling of breed composition. The performance of ordinary least square (OLS) and mixed model methods (MLM), both fitting one-marker-at-a-time, were compared to that of a Bayesian multiple-regression (BMR) method that fitted all markers simultaneously. The OLS method showed the highest rate of false-positives due to ignoring breed composition and pedigree relationship. The MLM approach showed spurious false-positives when breed composition was not accounted for. The BMR outperformed both OLS and MLM approaches. It was shown that BMR could mitigate the confounding effects of breed composition and relationship without compromising its power. In contrast to the MLM where fitting of breed composition reduced both its power and false-positive rates, when breed composition was considered in the BMR it resulted in loss of power without a change of false-positive rate. It was concluded that the BMR is able to self-correct for the effects of population structure and relatedness.