MAPGAPS: Multiply-Aligned Profiles for Global Alignment of Protein Sequences
Identifies and accurately aligns up to a million or more sequences, taking as input a database of fasta formatted protein sequences and, as the query, a hierarchical multiple sequence alignment (hiMSA), such as are available from the NCBI (see below). MAPGAPS generated multiple sequence alignments (MSAs) are used as input by BPPS, SPARC and DARC.
Neuwald, A.F., Lanczycki, C. J., Hodges, T.K., and A. Marchler-Bauer. 2020. Obtaining extremely large and accurate protein multiple sequence alignments from curated hierarchical alignments. DATABASE. In press.
Neuwald, A.F., L. Aravind & S.F. Altschul. 2018. Inferring Joint Sequence-Structural Determinants of Protein Functional Specificity. eLife doi: 10.7554/eLife.29880.001
Neuwald, A.F. 2009. Rapid detection, classification and accurate alignment of up to a million or more related protein sequences. Bioinformatics 25: 1869-1875.
Funding: National Institutes of Health, National Institute of General Medical Sciences grants R01GM078541 and R01GM125878.