NCBI-BLAST programs optimization on XSEDE resources for sustainable aquaculture
The development of genomic resources of non-model organisms is now becoming commonplace as the cost of sequencing continues to decrease. The Genome Informatics Facility in collaboration with the Southwest Fisheries Science Center (SWFSC), NOAA is creating these resources for sustainable aquaculture in Seriola lalandi. Gene prediction and annotation are common steps in the pipeline to generate genomic resources, which are computationally intense and time consuming. In our steps to create genomic resources for Seriola lalandi, we found BLAST to be one of our most rate limiting steps. Therefore, we took advantage of our XSEDE Extended Collaborative Support Services (ECSS) to reduce the amount of time required to process our transcriptome data by 300 percent. In this paper, we describe an optimized method for the BLAST tool on the Stampede cluster, which works with any existing datasets or database, without any modification. At modest core counts, our results are similar to the MPI-enabled BLAST algorithm (mpiBLAST), but also allow the much needed and improved flexibility of output formats that the latest versions of BLAST provide. Reducing this time-consuming bottleneck in BLAST will be broadly applicable to the annotation of large sequencing datasets for any organism.
This is a proceeding from the XSEDE Conference: Scientific Advancements Enabled by Enhanced Cyberinfrastructure (2015), doi:10.1145/2792745.2792749. Posted with permission.