Converts into MAPGAPS input format NCBI Conserved Domain hiMSAs, which may be obtained by going to the NCBI CDD link below.
NCBI CDD hiMSAs
Click the link on the right to download NCBI Conserved Domain hiMSAs as a compressed tarball.
Kingdom and phylum taxonomic labeling of NCBI nr protein sequences for GISMO, BPPS, and DARC. C++ source code and a simple example are included in the tarball.
NCBI taxonomy dump ftp site
The taxdump.tar.gz and accession2taxid/prot.accession2taxid.gz files are required by the AddPhylum program.
NCBI non-redundant (nr) protein sequence ftp site
Download the fasta formatted nr.gz and pdbaa.gz files at this site for use as input to MAPGAPS and other programs.
Converts alignments from cma-format to mFASTA, from mFASTA to cma, and from cma to rich text format (rtf). The rtf files are suitable for publication.
Merges concatenated cma files into one file and removes sequence fragments and redundant sequences. Click the link to the right to obtain the executable (beta version).