=========== Seach for & alignment of related nr sequences using MAPGAPS. ========

Download protein nr fasta data file from the NCBI.
Download pdbaa fasta file from the NCBI.
Download taxdump files from the NCBI.
Use the addtaxon program to add taxonomic information to nr (creates nrtx);
Use 'fasplit nrtx 250000 < nrtx' to create bitsize nrtx files for analysis
Put nrtx.* and pdbaa files into a directory and set the environmental variable 
   $FASTADIR to that directory (within .cshrc file or bash equivalent):

    setenv FASTADIR /usr/local/projects/aneuwald/molbio/fasta/

Download a curated CDD MSA (in mFasta format) from the NCBI website: e.g., cd00315.
save this file as cd00315.fa

Run 'mapgaps cd00315 nrtx.1' 
    'mapgaps cd00315 nrtx.2' 
	:	:	:
  ... to create nrtx.1_map.seq output files. 
Concatenate these into a single file.
	cat nrtx.*_map.seq > All

Delete the rest of the nrtx output files:
     \rm -f nrtx.*.seq nrtx.*.cma nrtx.*.tpl

Run 'mapgaps cd00315_X All' to create the files:
    All_map.cma
    All_map.tpl
    All_map.seq

Run 'mapgaps All_map ' to create new (improved) query files

then run 'mapgaps All_map nrtx.1' 
         'mapgaps All_map nrtx.2' 
	   :	   :	   :

Concatenate the output cma files:
    cat nrtx.*_A.mma > Main.mma
You may also want to run mapgaps on the pdbaa sequences and to add these in as well:
    cat pdbaa_A.mma nrtx.*_A.mma > Main.mma

Merge the alignments removing fragments (<75% matches) and redundant (>98% identical) sequences using the PurgeMSA program (also available at this website).

This creates a file that can serve as input to the BPPS program.

       





