GISMO

Gibbs Sampler for Multi-alignment Optimization

GISMO is a Markov chain Monte Carlo sampler for protein multiple sequence alignment. Features central to its performance are: (i) A “top-down” strategy with a favorable asymptotic time complexity that first identifies regions generally shared by all the sequences, and then realigns closely related subgroups in tandem. (ii) Inferred position-specific gap penalties that favor the placement of insertions between conserved blocks making up the proteins’ structural core. (iii)  A Bayesian statistical measure of alignment quality based on the minimum description length principle and on Dirichlet mixture priors. Consequently, GISMO aligns sequence regions only when statistically justified. This is unlike methods based on the ad hoc, but widely used, sum-of-the-pairs scoring system, which will align random sequences.

verion 3 parallelized using OpenMP

References:

Neuwald, A.F. A fast, parallelized version of the GISMO multiple sequence alignment program. In preparation.

Neuwald, A.F and S. F. Altschul. 2016. Bayesian Top-down Protein Sequence Alignment with Inferred Position-Specific Gap Penalties. Plos Comp. Biol. 12(5): e1004936.

Neuwald A.F. and Liu J.S. 2004. Gapped alignment of protein sequence motifs through Monte Carlo optimization of a hidden Markov model.  BMC Bioinformatics 5: 157

Funding:

National Institutes of Health, National Institute of General Medical Sciences grants R01GM078541 & R01GM125878.

National Institutes of Health, National Library of Medicine grant LM06747.