fLPS: fast discovery of compositional biases for the protein universe

   fLPS is a program that rapidly annotates compositionally-biased regions in biological sequences. The algorithm picks out 'Low Probability Subsequences' (LPSs) through a process of probability minimization. fLPS is explained in detail in the citation below. Using fLPS, compositionally-biased regions of two types can be automatically annotated:

(i) single-residue:T hese occur when there are many residues of a single type in a short span. For example, PPPPPEPPPPAPPPEPPPIP is a protein stretch biased for P (proline). This is given a signature {P}.

(ii) multiple-residue:These occur when two or more residue types predominate in a short span. For example, HQHQHQQQQHHQQQQHHHHAHHHQQHHQIQQQQ is a protein region biased for H and Q (histidine and glutamine). This is given a signature {QH}.

The program runs without the user having to specify specific sets of residues to analyze. For source code and also fLPS executables, compiled for the Mac OSX and Linux operating systems, along with explanatory READMEs, please go HERE. A file with example input and output files is available HERE.

In benchmarks on an Intel Xeon E5 3.5GHz processor, to annotate both multiple- and single-residue biases, fLPS can take less than a second to process the yeast proteome (5,872 proteins), and ~1 hour for the whole TrEMBL database (August 2016 version, >65,000,000 proteins).

fLPS has 9 different command-line options to define parameters, format output, etc. Full details of these options and how to use the program can be found in the README.  


Annotation of low-complexity regions:

To label low-complexity regions, the following run parameters are suitable:

 ./fLPS -t1e-5 -m5 -M25 INPUT.FASTA > OUTPUT.FASTA

Citation and contact

Please cite this, if you use fLPS:

P. M. Harrison. (2017). "fLPS: fast discovery of compositional biases for the protein universe", BMC Bioinformatics18(1): 476, doi: 10.1186/s12859-017-1906-3.