Optimization of SAMtools sorting using OpenMP tasks

dc.contributor.author Weeks, Nathan
dc.contributor.author Weeks, Nathan
dc.contributor.author Luecke, Glenn
dc.contributor.department Computer Science
dc.contributor.department Mathematics
dc.date 2018-02-18T19:22:41.000
dc.date.accessioned 2020-06-30T01:54:43Z
dc.date.available 2020-06-30T01:54:43Z
dc.date.copyright Sun Jan 01 00:00:00 UTC 2017
dc.date.embargo 2018-04-25
dc.date.issued 2017-04-26
dc.description.abstract <p>SAMtools is a widely-used genomics application for post-processing high-throughput sequence alignment data. Such sequence alignment data are commonly sorted to make downstream analysis more efficient. However, this sorting process itself can be computationally- and I/O-intensive: high-throughput sequence alignment files in the de facto standard binary alignment/map (BAM) format can be many gigabytes in size, and may need to be decompressed before sorting and compressed afterwards. As a result, BAM-file sorting can be a bottleneck in genomics workflows. This paper describes a case study on the performance analysis and optimization of SAMtools for sorting large BAM files. OpenMP task parallelism and memory optimization techniques resulted in a speedup of 5.9X versus the upstream SAMtools 1.3.1 for an internal (in-memory) sort of 24.6 GiB of compressed BAM data (102.6 GiB uncompressed) with 32 processor cores, while a 1.98X speedup was achieved for an external (out-of-core) sort of a 271.4 GiB BAM file.</p>
dc.description.comments <p>This is a manuscript of an article published as Weeks, Nathan T., and Glenn R. Luecke. "Optimization of SAMtools sorting using OpenMP tasks." <em>Cluster Computing</em> (2017): 1-12. The final publication is available at Springer via <a href="http://dx.doi.org/10.1007/s10586-017-0874-8" target="_blank">http://dx.doi.org/10.1007/s10586-017-0874-8.</a></p>
dc.format.mimetype application/pdf
dc.identifier archive/lib.dr.iastate.edu/cs_pubs/10/
dc.identifier.articleid 1009
dc.identifier.contextkey 10576673
dc.identifier.s3bucket isulib-bepress-aws-west
dc.identifier.submissionpath cs_pubs/10
dc.identifier.uri https://dr.lib.iastate.edu/handle/20.500.12876/19864
dc.language.iso en
dc.source.bitstream archive/lib.dr.iastate.edu/cs_pubs/10/paper.pdf|||Fri Jan 14 18:09:03 UTC 2022
dc.source.uri 10.1007/s10586-017-0874-8
dc.subject.disciplines Computer Sciences
dc.subject.disciplines Mathematics
dc.subject.keywords Bioinformatics
dc.subject.keywords High-throughput sequencing
dc.subject.keywords OpenMP
dc.subject.keywords Sorting
dc.subject.keywords Burst buffer
dc.title Optimization of SAMtools sorting using OpenMP tasks
dc.type article
dc.type.genre article
dspace.entity.type Publication
relation.isAuthorOfPublication cecca677-cc29-4765-827a-3b844df2fe2b
relation.isOrgUnitOfPublication f7be4eb9-d1d0-4081-859b-b15cee251456
relation.isOrgUnitOfPublication 82295b2b-0f85-4929-9659-075c93e82c48
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
449.54 KB
Adobe Portable Document Format