Algorithms for synteny-based phylostratigraphy and gene origin classification

dc.contributor.advisor Eve S. Wurtele
dc.contributor.author Arendsee, Zebulun
dc.contributor.department Department of Genetics, Development, and Cell Biology (LAS)
dc.date 2019-08-21T10:14:20.000
dc.date.accessioned 2020-06-30T03:14:52Z
dc.date.available 2020-06-30T03:14:52Z
dc.date.copyright Wed May 01 00:00:00 UTC 2019
dc.date.embargo 2001-01-01
dc.date.issued 2019-01-01
dc.description.abstract <p>With every newly sequenced species we discover hundreds of novel protein coding genes. Many of these "orphan" genes have been experimentally proven to have dramatic functions in development, sexual dimorphism, pathogen resistance, and social traits like symbiosis. Whereas in the past, researchers viewed genes as the product of continuous variation acting on ancient material, we now know that novel genes may arise de novo from non-genic sequence. Thus evolutionary experimentation is not limited to tweaking existing genes or their regulatory patterns. Any orphan genes that arose in the distant past, should appear today as lineage-specific genes (or gene families). The search for genes by their relative time of origin is called "phylostratigraphy". However, phylostratigraphy has proven to be a challenging task with different methodologies often yielding contradictory conclusions. Standard phylostratigraphy infers the age of a gene by finding the most distant species that has an inferred homolog. However, this approach is highly sensitive to annotation quality and cannot easily distinguish between rapidly evolving genes and genes of de novo origin.</p> <p>This dissertation contributes a suite of tools for more accurately determining the phylostratigraphic age of genes and the level of support for the classification. First, we developed phylostratr to automate standard phylostratigraphy. Second, we developed a program, synder, to infer syntenic-homologs of query features using a synteny map. Third, we developed fagin, a package that builds on synder to search query genes against related species for traces of genic or non-genic orthology. The pipeline can distinguish orphans with high-confidence data support from orphans identified due to bad assembly or missing data. We traced many orphans to their non-genic cousins, identifying the non-genic footprint from which they arose. We linked others to putative genes in related species from which they diverged beyond recognition. Knowing the approximate location of each gene across species and the amount of data support provides a launching point for future orphan studies.</p>
dc.format.mimetype application/pdf
dc.identifier archive/lib.dr.iastate.edu/etd/16963/
dc.identifier.articleid 7970
dc.identifier.contextkey 14820784
dc.identifier.s3bucket isulib-bepress-aws-west
dc.identifier.submissionpath etd/16963
dc.identifier.uri https://dr.lib.iastate.edu/handle/20.500.12876/31146
dc.language.iso en
dc.source.bitstream archive/lib.dr.iastate.edu/etd/16963/Arendsee_iastate_0097E_17388.pdf|||Fri Jan 14 21:08:36 UTC 2022
dc.subject.disciplines Bioinformatics
dc.subject.keywords Bioinformatics
dc.subject.keywords Comparative Genomics
dc.subject.keywords Phylostratigraphy
dc.title Algorithms for synteny-based phylostratigraphy and gene origin classification
dc.type dissertation
dc.type.genre dissertation
dspace.entity.type Publication
relation.isOrgUnitOfPublication 9e603b30-6443-4b8e-aff5-57de4a7e4cb2
thesis.degree.level dissertation
thesis.degree.name Doctor of Philosophy
File
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Arendsee_iastate_0097E_17388.pdf
Size:
10.15 MB
Format:
Adobe Portable Document Format
Description: