fLPS 2.0: rapid annotation of compositional biases in biological
sequences
fLPS 2.0 is a
program that rapidly annotates compositionally-biased
regions in biological sequences (both protein and DNA). The algorithm picks out
'Low Probability Subsequences' (LPSs) through a process of probability
minimization. fLPS is explained in detail in the
citations below. Using fLPS, compositionally-biased
regions of two types can be automatically annotated:
(i) single-residue:
These occur when there are many residues of a single type in a short span. For
example, PPPPPEPPPPAPPPEPPPIP is a protein stretch biased for P (proline). This
is given a signature {P}.
(ii) multiple-residue:
These occur when two or more residue types predominate in a short span. For
example, HQHQHQQQQHHQQQQHHHHAHHHQQHHQIQQQQ is a protein region biased for H and
Q (histidine and glutamine). This is given a signature {QH}.
The program runs automatically to find biases from all possible residue types.
For source code and also fLPS2 executables, compiled
for the Mac OSX and Linux operating systems, along with explanatory READMEs,
please go HERE. A file with example input and
output files is available HERE.
In benchmarks on an Intel Xeon E5 3.5GHz processor, to annotate both multiple-
and single-residue biases, the fLPS algorithm can
take less than a second to process the yeast proteome (5,872 proteins), and ~1
hour for the whole TrEMBL database (August 2016
version, >65,000,000 proteins).
fLPS2 has several command-line options to define parameters, format output, etc.
Full details of these options and how to use the program can be found in the README.
Annotation of low-complexity regions:
To label low-complexity regions, the following
run parameters are suitable:
./fLPS2 -t1e-5
-m5 -M25 INPUT.FASTA > OUTPUT.FASTA
P. M. Harrison. (2017). "fLPS: fast discovery of compositional biases for the
protein universe", BMC Bioinformatics, 18(1): 476, doi: 10.1186/s12859-017-1906-3.
P.M. Harrison. (2021). “fLPS 2.0: rapid annotation of compositional biases in
biological sequences”, PeerJ,
submitted.
For further inquiries
and comments about fLPS, please contact Paul Harrison
Copyright, 2017, 2021
Paul M. Harrison.