

* DBS seq set:
  - Choose the center of the distribution +/- 2 SD as first input.
     - but this might eliminate subfamiles associated with large proteins.
* LAPIS: 
   - optimize parameters (strategies on how to do this).
     - size of gismo seq set
     - purge level of gismo seq set
     - distribution of # of gismo blocks and block lengths.
* MAPGAPS:
   - HMM version of PSI-BLAST or else use jackhmmer C code.
   - use jackhmmer to align sequences to profiles in MAPGAPS (instead of PSI-BLAST).
   - use profile-to-profile comparisons to globally align the profiles.
     - HHpred?
     - use MSA profiles with gismo?
* Domain footprint:
   - use ch-hit algorithm to define domain footprints?
   - use cd-hit words to initialize GISMO's block based mode.
   - use aa similarities to match up cd-hit words.



