MAPGAPS: Multiply-Aligned Profiles for Global Alignment of Protein Sequences
Identifies and accurately aligns up to a million or more sequences, taking as input a database of fasta formatted protein sequences and, as the query, a hierarchical multiple sequence alignment (hiMSA), such as are available from the NCBI (see below). MAPGAPS generated multiple sequence alignments (MSAs) are used as input by BPPS, SPARC and DARC.
References:
Neuwald, A.F., Lanczycki, C. J., Hodges, T.K., and A. Marchler-Bauer. 2020. Obtaining extremely large and accurate protein multiple sequence alignments from curated hierarchical alignments. DATABASE. In press.
Neuwald, A.F., L. Aravind & S.F. Altschul. 2018. Inferring Joint Sequence-Structural Determinants of Protein Functional Specificity. eLife doi: 10.7554/eLife.29880.001
Neuwald, A.F. 2009. Rapid detection, classification and accurate alignment of up to a million or more related protein sequences. Bioinformatics 25: 1869-1875.
Funding: National Institutes of Health, National Institute of General Medical Sciences grants R01GM078541 and R01GM125878.