fLPS 2.0: rapid annotation of compositional biases in biological sequences

fLPS 2.0 is a program that rapidly annotates compositionally-biased regions in biological sequences (both protein and DNA). The algorithm picks out 'Low Probability Subsequences' (LPSs) through a process of probability minimization. fLPS is explained in detail in the citations below. Using fLPS, compositionally-biased regions of two types can be automatically annotated:

(i) single-residue: These occur when there are many residues of a single type in a short span. For example, PPPPPEPPPPAPPPEPPPIP is a protein stretch biased for P (proline). This is given a signature {P}.

(ii) multiple-residue: These occur when two or more residue types predominate in a short span. For example, HQHQHQQQQHHQQQQHHHHAHHHQQHHQIQQQQ is a protein region biased for H and Q (histidine and glutamine). This is given a signature {QH}.

The program runs automatically to find biases from all possible residue types. For source code and also fLPS2 executables, compiled for the Mac OSX and Linux operating systems, along with explanatory READMEs, please go HERE. A file with example input and output files is available HERE.

In benchmarks on an Intel Xeon E5 3.5GHz processor, to annotate both multiple- and single-residue biases, the fLPS algorithm can take less than a second to process the yeast proteome (5,872 proteins), and ~1 hour for the whole TrEMBL database (August 2016 version, >65,000,000 proteins).

fLPS2 has several command-line options to define parameters, format output, etc. Full details of these options and how to use the program can be found in the README.


Annotation of low-complexity regions:

To label low-complexity regions, the following run parameters are suitable:

 ./fLPS2 -t1e-5 -m5 -M25 INPUT.FASTA > OUTPUT.FASTA


Citation and contact

Please cite these, if you use fLPS2:

P. M. Harrison. (2017). "fLPS: fast discovery of compositional biases for the protein universe", BMC Bioinformatics18(1): 476, doi: 10.1186/s12859-017-1906-3.

P.M. Harrison. (2021). “fLPS 2.0: rapid annotation of compositional biases in biological sequences”, PeerJ, submitted.