Auxiliary

Auxiliary programs

CDD2MGS

Converts into MAPGAPS input format NCBI Conserved Domain hiMSAs, which may be obtained by going to the NCBI CDD link below.

NCBI CDD hiMSAs

Click the link on the right to download NCBI Conserved Domain hiMSAs as a compressed tarball.

AddPhylum

Kingdom and phylum taxonomic labeling of NCBI nr protein sequences for GISMO, BPPS, and DARC.  C++ source code and a simple example are included in the tarball.  

 

NCBI taxonomy dump ftp site

The taxdump.tar.gz and accession2taxid/prot.accession2taxid.gz files are required by the AddPhylum program.

NCBI non-redundant (nr) protein sequence ftp site

Download the fasta formatted nr.gz and pdbaa.gz files at this site for use as input to MAPGAPS and other programs.

ConvertMSA 

Converts alignments from cma-format to mFASTA, from mFASTA to cma, and from cma to rich text format (rtf).  The rtf files are suitable for publication.

PurgeMSA

Merges concatenated cma files into one file and removes sequence fragments and redundant sequences. Click the link to the right to obtain the executable (beta version).

GetPDB

Retrieves pdb coordinate files based on fasta defline identifiers within an input file  and then creates pdb files with modeled hydrogens (e.g., 1abc_H.pdb) as required by DARC, SPARC, and our other programs.  GetPDB requires Perl and the reduce program by Michael Word.