Plant genome informatics: evaluation and analysis of genomic DNA features involved in the transcriptional processing of protein coding genes
Randy C. Shoemaker
As biological data collection methods have become more cost effective and less time consuming, the necessity for computational tools to store, manage, and analyze this data has led to the creation of a broad field of research. Bioinformatics, while firmly rooted in the technology of information management, is now a mainstream component in the majority of scientific investigations. With the vast majority of effort in bioinformatics being applied to research on vertebrate species, researchers in the plant sciences have often been left with less than satisfactory tools. The research presented in this dissertation was done in an effort to advance the quality of bioinformatic tools available for plant genomics and to develop a better understanding of the unique aspects of plant biological processes such as the transcriptional processing of protein coding genes;In the course of this study, I have developed an extensible infrastructure for integrating biological data sources and applying them to hypothesis driven research. Eleven plant species xGDB databases have been made publicly available to facilitate progress in plant genome informatics. A sophisticated system was devised and developed to investigate the reliability of gene structure annotations on a per gene basis. With this, I generated the necessary dataset to develop a plant specific probabilistic model of RNA polymerase II transcription start sites;The prediction of transcription start sites and promoter regions in plant genomic DNA was found to be considerably more challenging than similar endeavors in vertebrate sequences. Probabilistic models based solely on plant promoter sequences improved the outlook for promoter prediction in plant genomes. However, owing to the lack of a pervasive signal such as the presence of CpG islands, results are still less than ideal;In conclusion, progress was made in providing resources tailored to the plant research community and in the investigation of transcriptional processing in plants. Distinct regions which may be functionally significant in the regulation of transcription were discovered. In addition, a number of genes utilizing alternative transcription start sites and alternative cleavage/polyadenylation sites were revealed. The results of this study demonstrate that the process of transcription in plants is significantly distinct from that of other organisms and warrants independent and thorough investigation.