PLAST original benchmark

After a full year of software development, from June 2011 to March 2012, making the professional release of PLAST from its reseach prototype, we conducted the following benchmark.

PLAST has been compared to BLAST and SSearch to evaluate speedup and data quality produced by the new algorithm. Since we used SSearch in this test, we chose reduced data sets to take into account long running times.

PLASTp benchmark compared the first 2327 proteins from the black cottonwood Populus trichocarpa proteome against the first 2.9 million sequences from the NCBI RefSeq databank. All computations were conducted on an Apple MacPro computer.

Software

PLAST: release 2.2.0
BLAST: release 2.2.26+ from NCBI
SSearch: release 36 from University of Virginia

Datasets

Data sets retrieved on April 25th, 2012:

  1. Query databank: Populus trichocarpa, Fasta file Ptrichocarpa_156_peptide.fa.gz from ftp://ftp.jgi-psf.org/pub/JGI_data/phytozome/v7.0/Ptrichocarpa/annotation/.
  2. Subject databank: NCBI RefSeq pre-formatted databank, volume 00. File refseq_protein.00.tar.gz from ftp://ftp.ncbi.nih.gov/blast/db/ were processed through blastdbcmd tool to extract the Fasta file.

Computer

All tests were conducted on an Apple MacPro computer running OSX-Lion (10.7.3) on two 2.66GHz 6-Core Intel Xeon “Westmere” processors, 32 Gb RAM and 1 Tb HDD.

Results

Accuracy vs. SSearch Running time (s)
(%)
Cores: 1 4 8 12
BLASTp
74.4
37,762 10,253 6,229 4,891
PLASTp
74.7
  1,302 394 262 214
speedup(*) 29x 96x 144x 176x
plast speedup over blast

PLAST speedup over Blast

Comments

  • Softwares were configured using an increasing number of cores for computation, a BLOSUM62 matrix, an E-Value threshold set to 1e-3 and results were produced in tabular formatted files to enable comparison of data between BLAST, PLAST and SSearch.
  • Accuray was evaluated by computing the fraction (%) of sequence alignments produced by each algorithm that are also found by a reference algorithm: SSearch. Results from BLAST and PLAST were compared with SSearch as follows: for each query sequence, we checked equality between hit sequence IDs and sequence alignment locations.
  • It is worth noting that PLAST is faster than BLAST even on a single computing core.
  • (*) speedup of PLAST over BLAST running on a single computing core.

More benchmarks are available.

Comments are closed