Leveraging genetic time series data to improve detection of natural selection

Thumbnail Image
Hellams, Luvenia
Major Professor
Karin Dorman
Committee Member
Journal Title
Journal ISSN
Volume Title
Research Projects
Organizational Units
Organizational Unit
Journal Issue
Is Version Of

My work focuses on the problem of detecting natural selection from genetic time series data. This dissertation is motivated by genomic sequence data from populations of the Porcine Reproductive and Respiratory Syndrome Virus (PRRSV) sampled temporally during the early stages of infection from the blood of multiple pigs. An important biological question in this context is to understand what forces drive genetic changes in the virus populations. From knowledge of when and how selection acts on these viruses, it is possible to discern how the host pig attacks the virus as well as how the virus responds. Ultimately, such knowledge can help to guide vaccination, breeding and treatment strategies, which could profoundly reduce the morbidity and economic loss wrought by this virus. Given counts Y (t) of an allele at a locus observed in a sample from a population at discrete timepoints 0, t1, t2, . . ., my goal is to detect when there is evidence of selection acting on the allele. An increase Y (t) > Y (0) may indicate selection for the allele, while a decrease could reveal selection against the allele, but inheritance across generations is a random process, and change is guaranteed even under neutral (no selection) conditions. The magnitude of pure genetic drift, the neutral random process that produces genetic change even in the absence of disruptive forces, is determined by the population size N, such that random fluctuations dominate in populations with small N, but completely disappear as N → ∞. To demonstrate the conundrum, I implement the Cochran-Mantel-Haenszel (CMH) test to detect significant association between genetic alleles and time when multiple subjects and timepoints are available. Though the CMH test finds significant temporal trends in the PRRSV data, it cannot eliminate the possibility of pure genetic drift. I propose a novel, N-agnostic test for selection in such populations and demonstrate its properties in extensive simulation. Unfortunately, the test is particularly low-powered under some conditions, including those pervading the PRRSV dataset. Another test, the FITR test, requires estimation of N but assumes normality of the temporal increments in relative allele frequency, which is also not satisfied in the PRRSV data. I extend the FITR test to use normalizing transformations, which substantially extends the applicability of the test. I demonstrate that the transformations reduce the overall skewness and excess kurtosis of the original data, while better conserving the type-I error rate of the test. This work contributes one new and one improved test for detecting selection in genetic time series data that can aid in the fight against infectious disease as well as other selection-related applications.

Subject Categories
Tue May 01 00:00:00 UTC 2018