NCBI-BLAST programs optimization on XSEDE resources for sustainable aquaculture

Date
2015-01-01
Authors
Severin, Andrew
Seetharam, Arun
Gomez, Antonio
Purcell, Catherine
Seetharam, Arun
Hyde, John
Blood, Philip
Severin, Andrew
Major Professor
Advisor
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Altmetrics
Authors
Research Projects
Organizational Units
Genome Informatics Facility
Organizational Unit
Journal Issue
Series
Department
Genome Informatics Facility
Abstract

The development of genomic resources of non-model organisms is now becoming commonplace as the cost of sequencing continues to decrease. The Genome Informatics Facility in collaboration with the Southwest Fisheries Science Center (SWFSC), NOAA is creating these resources for sustainable aquaculture in Seriola lalandi. Gene prediction and annotation are common steps in the pipeline to generate genomic resources, which are computationally intense and time consuming. In our steps to create genomic resources for Seriola lalandi, we found BLAST to be one of our most rate limiting steps. Therefore, we took advantage of our XSEDE Extended Collaborative Support Services (ECSS) to reduce the amount of time required to process our transcriptome data by 300 percent. In this paper, we describe an optimized method for the BLAST tool on the Stampede cluster, which works with any existing datasets or database, without any modification. At modest core counts, our results are similar to the MPI-enabled BLAST algorithm (mpiBLAST), but also allow the much needed and improved flexibility of output formats that the latest versions of BLAST provide. Reducing this time-consuming bottleneck in BLAST will be broadly applicable to the annotation of large sequencing datasets for any organism.

Comments

This is a proceeding from the XSEDE Conference: Scientific Advancements Enabled by Enhanced Cyberinfrastructure (2015), doi:10.1145/2792745.2792749. Posted with permission.

Description
Keywords
Citation
DOI