Optimization of SAMtools sorting using OpenMP tasks
dc.contributor.author | Weeks, Nathan | |
dc.contributor.author | Luecke, Glenn | |
dc.contributor.department | Department of Computer Science | |
dc.contributor.department | Mathematics | |
dc.date | 2018-02-18T19:22:41.000 | |
dc.date.accessioned | 2020-06-30T01:54:43Z | |
dc.date.available | 2020-06-30T01:54:43Z | |
dc.date.copyright | Sun Jan 01 00:00:00 UTC 2017 | |
dc.date.embargo | 2018-04-25 | |
dc.date.issued | 2017-04-26 | |
dc.description.abstract | <p>SAMtools is a widely-used genomics application for post-processing high-throughput sequence alignment data. Such sequence alignment data are commonly sorted to make downstream analysis more efficient. However, this sorting process itself can be computationally- and I/O-intensive: high-throughput sequence alignment files in the de facto standard binary alignment/map (BAM) format can be many gigabytes in size, and may need to be decompressed before sorting and compressed afterwards. As a result, BAM-file sorting can be a bottleneck in genomics workflows. This paper describes a case study on the performance analysis and optimization of SAMtools for sorting large BAM files. OpenMP task parallelism and memory optimization techniques resulted in a speedup of 5.9X versus the upstream SAMtools 1.3.1 for an internal (in-memory) sort of 24.6 GiB of compressed BAM data (102.6 GiB uncompressed) with 32 processor cores, while a 1.98X speedup was achieved for an external (out-of-core) sort of a 271.4 GiB BAM file.</p> | |
dc.description.comments | <p>This is a manuscript of an article published as Weeks, Nathan T., and Glenn R. Luecke. "Optimization of SAMtools sorting using OpenMP tasks." <em>Cluster Computing</em> (2017): 1-12. The final publication is available at Springer via <a href="http://dx.doi.org/10.1007/s10586-017-0874-8" target="_blank">http://dx.doi.org/10.1007/s10586-017-0874-8.</a></p> | |
dc.format.mimetype | application/pdf | |
dc.identifier | archive/lib.dr.iastate.edu/cs_pubs/10/ | |
dc.identifier.articleid | 1009 | |
dc.identifier.contextkey | 10576673 | |
dc.identifier.s3bucket | isulib-bepress-aws-west | |
dc.identifier.submissionpath | cs_pubs/10 | |
dc.identifier.uri | https://dr.lib.iastate.edu/handle/20.500.12876/19864 | |
dc.language.iso | en | |
dc.source.bitstream | archive/lib.dr.iastate.edu/cs_pubs/10/paper.pdf|||Fri Jan 14 18:09:03 UTC 2022 | |
dc.source.uri | 10.1007/s10586-017-0874-8 | |
dc.subject.disciplines | Computer Sciences | |
dc.subject.disciplines | Mathematics | |
dc.subject.keywords | Bioinformatics | |
dc.subject.keywords | High-throughput sequencing | |
dc.subject.keywords | OpenMP | |
dc.subject.keywords | Sorting | |
dc.subject.keywords | Burst buffer | |
dc.title | Optimization of SAMtools sorting using OpenMP tasks | |
dc.type | article | |
dc.type.genre | article | |
dspace.entity.type | Publication | |
relation.isAuthorOfPublication | cecca677-cc29-4765-827a-3b844df2fe2b | |
relation.isOrgUnitOfPublication | f7be4eb9-d1d0-4081-859b-b15cee251456 | |
relation.isOrgUnitOfPublication | 82295b2b-0f85-4929-9659-075c93e82c48 |
File
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- paper.pdf
- Size:
- 449.54 KB
- Format:
- Adobe Portable Document Format
- Description: