PLAST command-line arguments

Mandatory arguments

Each PLAST job requires at least to use the following arguments:

Argument Description
-p comparison method. One of: plastn, plastp, plastx, tplastn or tplastx
-i the query file provided as a Fasta formatted sequence file
-d the reference databank. Either a Fasta file or a BLAST databank
-o the results file

As mentioned in the table, PLAST is capable of directly working with Fasta files. However, in the context of the reference databank, you can also provide a databank name when you have made use of KDMS (Korilog Databank Manager System) that is provided with KlastRunner. Such a databank name can be obtained by having a look at the KDMS graphical frontend: on the right panel, have a look at the column called “Name”.

Controlling results size

You can control how many hits are reported in a result file by using these arguments:

Argument Description
-e E-value threshold. Default value is 10
-max-hit-per-query set the maximum number of hits aligned to a query. Default value is 10.
Requires to use additional argument: -force-query-order 1000
-max-hsp-per-hit set the maximum number of HSPs reported for a hit. Default value is 1.
Requires to use additional argument: -force-query-order 1000

If you want to get all possible hits/HSPs, simply pass 0 (zero) to -max-hit-per-query and -max-hsp-per-hit.

Sorting results by query IDs

Being a bank to bank sequence comparison tool, PLAST does not care about query ordering when producing results. It means that PLAST produces query/hit matches without any particular order. If you prefer PLAST producing results as BLAST, i.e. hits are sorted by query IDs, simply add the following argument to your PLAST command:

plast ... -force-query-order 1000 ...

Controlling speed

PLAST speed is of course controlled by the number of available cores:

Argument Description
-a Number of cores. Default is the maximum number of cores available on the computer running PLAST.

PLAST provides additional parameters to fine tune speed/quality ratio, as explained in the coming sections.

Optimizing quality/speed ratio using PLAST specific arguments

PLAST’s default configuration has been setup to provide an optimal ratio between speed and quality in order to produce results with quality similar to Blast. Even in such a configuration, you’ll have great speedup factors.

Depending on your needs you can enhance speed factors with little loss of quality in your results.

PLAST specific arguments for optimizing search jobs are:

Argument Description
-seeds-use-ratio Ratio of seeds to be used (see comment, below). [1..100], default is 100. Decrease value to speedup algorithm with little loss of quality
-s Ungapped threshold trigger a small gapped extension (see comment, below). [25..127], default is 38 and 55 for protein-based and nucleic-based comparisons, respectively. Increase value to speedup algorithm with little loss of quality.
-max-database-size Maximum allowed size (in bytes) for a database. If greater, database is segmented (see comment, below)

Fine tuning seed-ratio, threshold score and max-database-size may provide impressive acceleration of the KLAST comparison engine, with little loss of quality in the results. Carefuly read the following sections.

Optimizing PLAST: sample recipes

In order to tune PLAST correctly, we always invite our users to try the software with sample data sets. When you need to compare large set of sequences, always start your work by comparing a small subset of your data. This way, you can check the parameters, the results and the speed of the software.

As an example, if you have to compare 300,000 sequences against NCBI nt, start your work by comparing 300 query sequences against NCBI nt using default PLASTn parameters. Then, fine tune it (see below the use of seed-ratio, max-database-size and threshold score) and check the results. As soon as your parameters are fine, go ahead with 3,000 and/or 10,000 query sequences, and check results and speed. If everything is fine, then run the full comparison.

Optimizing PLAST at runtime: using seed-ratio

When using PLAST for protein-based sequence comparisons, the algorithm can be speedup using the seed-ratio parameter. As stated here, PLASTp algorithm relies on a finite table of seeds; there are about 6,200 seeds for BLOSUM50 and BLOSUM62 matrices, whatever the input sequence databanks (for more information, see Reference [1] here). During the comparison, PLAST orders seeds by occurrences, starting to process seeds producing the highest number of hits. So, it is possible to ask PLAST to use either the entire set of seeds to achieve a comparison, or a subset. This fine-tuning PLAST feature is achieved using the seed-ratio parameter, ranging from 1% to 100%. The highest seed-ratio you use, the highest sensitivity you get… the lowest seed-ratio you use, the highest speed you get with little loss in quality, as illustrated on this example:

Using seed ratio to speedup Plast

Using seed ratio to speedup Plast

Reducing number of seeds to use during a comparison still provides high results quality while dramatically reducing search f seeds t_klast_speedup.png">

WhenA ndary , yofine tune speeP, i.e.speed. Ifo preed-r f s,figusiand inng the seed17;ll6;shold score17;ll hameter. As sDg a comp-r f s,fT commandr ruFastetmoreePLASuped threence datan: ju to yhes wa compy and -maxt. Defaoon as your exametchiebmoustablehold sco,t PLASn: ju to yiported by hmorefur the ess seea c. Byault PLA,r exam17;ll6;shold score17;ll ha(17;ll6;-s17;ll hatent to )segmenta querl subse to s(38 protein-basparisons, the or proteotide comparisons -maxleAST prodpeeassitivity eeassible to wever, in thu havesutivelt PLAS paray sequences agaiprovbass=oseeducil on try teference databank, you can aoul these valuseed17;ll6;shold score17;ll h:fT commaaill provide resu res-ity ratiots qua, buth littlso ional paradup.png

<4 style="text-align: justify;">As an example, if y prodaring 300 9 d theor 5 d eotide com a fin in30")inst NCBISilva SSUabank name(740 sequences aga)a fini8es avaiI"> l-Xe fid comparisu rangch f seeds , ys 73 he cong the 20;Nameds or21;.

PLs4an id="Opticase_on_16s_ein_and_nucleotide_comparisons"> <4 stalign="center">

Using eeshold scorespeedup Plast com

ref="file:///Users/pdurand/Documents/personal/korilog/www.korilog.com/www.korilog.com/klast-algoshold sco-e-rati on sl">hereRthe folase on pstudon at 6,20reedmal ratiof Thold score matrmeter. As

smeter, raachieable on tmoreTp alLAST ordgAST ordplasT ordp the sTn paren you g PLAST for p1% tcompari-line arg, additent to y20;Namedsue to 21;.

WhenAno the yofine tune speeP, i.e.speenst Ned. Ifo preed-r f s,figusiand inng the seeddatabase-sizeameter. As sIts. Wh algormoun seeds) fofinet"] rvn he RAMorder to prodlohe bank name stad inrodmemorncread If,ing a cocomparison, PLqualitsequvs.set.-pagabanks (fo,fT commautocally redunation ad banks (forie sofyauo carefatcre setly inrodRAM. FplasPLASpuriblLAST is ces on a fiseeddatabase-sizeameter. As;y prod. W to po preLASpeter, rangare 300,algormoun seedRAMohave to cour resuuter runn litt17;ll6;database-sizeaxe8 x 217;ll ha(ePLASbank nameiad xuires at l17;ll6;8 x database-size17;ll has) foe.speehave to cowo banks (fo). Fpla NCB spe y prodg PLAST for fini32 Gbparisu ranghese vamax-database-size p1% t20M (ult is 1e to )100%. TM provide resulso ional paradup.pngseed4x>

As aWditividdihavepeed to preeddatabase-sizeae to speeen td, .ull complohe he softitsequbank nameiarodmemorncrFpla NCB spe yifS paray sequ by us="(m 12 Mben run the maximumabase-sizeameter. As100%.500; } (15 Mb). Also alwauo carereent, bdepeed timumabase-sizeatdue to iebmous10000; } (1 TMb) ;such a confiase yifS paray sequ by u00.thin big,xleAST prodnation ad softitseq a suthe s, inaraT job reqoe.ePLAShe sofm ess seea c capticula of the paray sequ by >

As aTaximumabase-sizeameter. As1chieable on tmoreTp alLAST ordgAST ordplasT ordp the sTn paren you g PLAST for p1% tcompari-line arg, additent to y20;Nameddatabase-sizeae to 21;.

PLAST speements for cil on try s matrix, gap costs and match/mismatch cost300,sived (seehe resuowing secte of .

dy>
Argument Description
-seedm> -seeix, ga>
-max-G> comparo use opefinicos>
-max-E> comparo use nsiondinicos>
-max-r> -maxrewarr a hit.eotide comph coststn only)
-max-q> -maxpef"Usy a hit.eotide comphtch coststn only)

If yment

PLASAents for -r -max-q (ix,ismatch cos)t300, screeable on tmorepn or tod. One

PLASAents for -G -max-E (costs and)t300,eable on tmoreaomparison.

PLASNe cce:ehe resuowing secte of oe.(*) decarad softult is 1e to d (seebyST for you haveuo careadditticular ordements fo>

PLASVa1px s matrix, gagainhe costs and mreeassowing p>

<4 stalign="center">
ArguGostopef> ArguGostnsiond> -seespan-see2 -maxspan-seepan-see (*)> -max9an-see2 -max8an-see2 -max7an-see2 -max6an-see2 -maxs2an-seepan-see> -maxspan-seepan-see> -maxs0an-seepan-see> -max9an-seepan-see> dy>

Ifs4an id="OptiUM62"> <4 stalign="center">

ArguGostopef> ArguGostnsiond> -sees3> -see3an-see> -maxs3an-see2an-see (*)> -maxs2an-see3an-see> -maxs1an-see3an-see> -maxs0an-see3an-see> -maxs5an-see2 -maxs4an-see2 -maxs3an-see2an-see> -maxs9an-seepan-see> -maxs8an-seepan-see> -maxs7an-seepan-see> -maxs6an-seepan-see> dy>

Ifspan id="Optieic-based_PLAST_search_plastn_only">

PLASVa1px h/mismatch cost(aents for -r -max-q) mreeassowing p>

ArguM cos> ArguMtch cos> -sees> -see-pan-see> -maxs> -see-2an-see> -maxs> -see-3an-see> -maxs> -see-4an-see> -max2> -see-3an-see (*)> -max4an-see-5an-see> dy>

If yGiin sh/mismatch costeed1,-1,1e tie costs and mreeassowing p>

ArguGostopef> ArguGostnsiond> -see3an-see2an-see (*)> -max2> -max2> -max> -maxs> -see2> -max> -max0> -see2> -max> -max4> -maxs> -see> -max3> -maxs> -see> -max2an-seepan-see> dy>

If yGiin sh/mismatch costeed1,-2,1e tie costs and mreeassowing p>

ArguGostopef> ArguGostnsiond> -see5an-see2 -max2> -max2> -max> -maxs> -see2> -max> -max0> -see2> -max> -max3> -maxs> -see> -max2an-seepan-see> -maxs> -seepan-see> dy>

If yGiin sh/mismatch costeed1,-3,1e tie costs and mreeassowing p>

ArguGostopef> ArguGostnsiond> -see5an-see2 -max2> -max2> -max> -maxs> -see2> -max> -max0> -see2> -max> -max2an-seepan-see> -maxs> -seepan-see> dy>

If yGiin sh/mismatch costeed1,-4,1e tie costs and mreeassowing p>

ArguGostopef> ArguGostnsiond> -see5an-see2 -maxs> -see2> -max> -max0> -see2> -max> -max2an-seepan-see> -maxs> -seepan-see> dy>

If yGiin sh/mismatch costeed2,-3,1e tie costs and mreeassowing p>

ArguGostopef> ArguGostnsiond> -see4> -see4> -see> -max2an-see4> -see> -max0an-see4> -see> -max3> -see3an-see> -max6an-see2 -max5an-see2 -max4an-see2 -max2> -max2> -max> dy>

If yGiin sh/mismatch costeed4,-5,1e tie costs and mreeassowing p>

ArguGostopef> ArguGostnsiond> -sees2an-see8 -max6> -max5an-see> -max5an-see5an-see> -max4an-see5an-see> -max3an-see5an-see> dy>

Ifspan id="Optitoring_job"><="nav-/a>moring a>

PLAST speeen td,sihavepeemoring requexecu:none

plast ... -forcbarhicale>

plast ...p [1/1]. Dec0%gn="cen16960eeds ,[00:00:08 - 00:00:00 - 00:00:08]dmem=298.7Mo (ixx=298.7Mo tot=0.3Go)ds to u[5082:5082] [====================]. De%e>

last_spent('fo_klast_spee-see>last_spription last_spee- -seeRcores ison.- -seePation a:hoorerence databank. EithIrerence databank. Eiefats he RAMoath spe ya PLAlge-id-segm (seeompare 300,y and -maxrence datas (fo.ee- -seePd:Dive a fioreexecu:nonee- -see&r of seedhes withf-imaee- -seeTholeeexecu:noneeds st300,sied wit: ellapsit,ere"> PLAStbdepetal sItsay, orlasno to preLAS20;Namere"> PLASeds 21;.- -seeNr of seeds to uess seeseehf resuowrm [ent-pagds to :petalrs to ]ee- -seeAt-aligd PLASid:Dive a fid r> dy>