Optimizing parallel sequence alignment sorting and epistasis detection, and parallel Fortran application resilience

dc.contributor.advisor Glenn R Luecke
dc.contributor.author Weeks, Nathan
dc.contributor.department Department of Computer Science
dc.date 2021-01-16T18:26:45.000
dc.date.accessioned 2021-02-25T21:39:53Z
dc.date.available 2021-02-25T21:39:53Z
dc.date.copyright Tue Dec 01 00:00:00 UTC 2020
dc.date.embargo 2023-01-07
dc.date.issued 2020-01-01
dc.description.abstract <p>This dissertation comprises published or accepted papers encompassing two areas in High Performance Computing research: optimization and parallelization of bioinformatics applications, and fault tolerance in parallel Fortran applications.</p> <p>The bioinformatics application optimization papers examine the computationally-expensive problems of epistasis detection in quantitative-trait genome-wide association studies (GWAS) and sequence alignment sorting.</p> <p>First, epiSNP, an application for identifying pairwise epistasis (genetic marker interactions), is subject to performance analysis and subsequent algorithmic and data structure optimizations, resulting in a ~12X speedup vs. the original (serial) application. Combined with distributed- and shared-memory techniques for dynamically load balancing pairwise operations across processes, a 38.43X speedup over the original parallel implementation (EPISNPmpi) is achieved on 126 nodes (each with 2 Intel Xeon Phi coprocessors) of the TACC Stampede supercomputer.</p> <p>For sequence-alignment sorting, optimizations to the popular open-source application SAMtools are described. These include more efficient data structures to reduce memory-management overhead, an improved external sorting implementation that reduces I/O, and the use of OpenMP tasks to better load balance compression, decompression, and sorting. The optimizations resulted in a 5.9X speedup for the benchmarked in-memory sort, and a 1.98X speedup for an external sort.</p> <p>In the domain of High Performance Computing fault tolerance for parallel Fortran applications, the first paper surveys the landscape of HPC technologies and techniques for developing resilient Fortran applications that are parallelized using the Message Passing Interface (MPI). MPI fault tolerance extensions are categorized and analyzed for Fortran compatibility, and issues pertaining to the use of Fortran I/O and MPI I/O for checkpoint/restart are discussed.</p> <p>The final paper both proposes changes to the Fortran standard to make its recent facilities for handling failed images (processes) more useful to and usable by application programmers, and introduces a prototype implementation that demonstrates the proposed semantics.</p>
dc.format.mimetype application/pdf
dc.identifier archive/lib.dr.iastate.edu/etd/18425/
dc.identifier.articleid 9432
dc.identifier.contextkey 21104887
dc.identifier.doi https://doi.org/10.31274/etd-20210114-160
dc.identifier.s3bucket isulib-bepress-aws-west
dc.identifier.submissionpath etd/18425
dc.identifier.uri https://dr.lib.iastate.edu/handle/20.500.12876/94577
dc.language.iso en
dc.source.bitstream archive/lib.dr.iastate.edu/etd/18425/Weeks_iastate_0097E_19203.pdf|||Fri Jan 14 21:41:54 UTC 2022
dc.subject.keywords Bioinformatics
dc.subject.keywords Fault Tolerance
dc.subject.keywords Fortran
dc.subject.keywords Message Passing Interface
dc.subject.keywords OpenMP
dc.title Optimizing parallel sequence alignment sorting and epistasis detection, and parallel Fortran application resilience
dc.type thesis en_US
dc.type.genre thesis en_US
dspace.entity.type Publication
relation.isOrgUnitOfPublication f7be4eb9-d1d0-4081-859b-b15cee251456
thesis.degree.discipline Computer Science
thesis.degree.level thesis
thesis.degree.name Doctor of Philosophy
File
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Weeks_iastate_0097E_19203.pdf
Size:
1011.77 KB
Format:
Adobe Portable Document Format
Description: