The focus of my research is to understand protein mechanisms at the atomic level. The standard approach toward achieving this goal is to determine high quality crystal structures of functionally important conformational states of a protein in order to identify the dynamic changes associated with its underlying mechanisms. My research focuses on an alternative and complementary approach that uses Bayesian statistical inference to predict aspects of protein mechanisms based on limited structural data augmented by vast numbers of protein sequences. In doing this, we are essentially following the example of Mendel and the geneticists that followed him: Just as Mendel obtained insight into unobserved genetic mechanisms through statistical inferences based on observed patterns of inherited traits, we seek to obtain insight into protein mechanisms through statistical inferences based on patterns of conserved residues in protein sequences – the cell's own language for encoding those mechanisms.
Sequence patterns that have been conserved for a billion years or more reflect strong selective pressures maintaining mechanistic similarities. Divergent patterns that are conserved in descendent proteins maintaining a particular divergent function likewise reflect mechanistic differences. Thus, non-random patterns of sequence conservation and divergence correspond to conservation and divergence of underlying mechanisms, which we define very broadly to include all atomic properties required for a protein's function. As a result, Bayesian inference of the evolutionary constraints imposed on functionally divergent proteins can reveal key components of the molecular machinery and thereby suggest likely mechanisms to test experimentally. We are currently applying this approach to P loop ATPases and GTPases, to protein kinases and to other, functionally-associated proteins. Our efforts have been greatly enhanced by the abundant sequence data provided by the genome projects, which, in this way, are opening up entirely new approaches to understanding biological mechanisms. We seek to apply the functional and mechanistic information gleaned from our research into other genome analysis efforts.
Figure legend: Bayesian analysis of Ran GTPases functional divergence.
Divergent residues (which were categorized through
statistical analysis of selective constraints, as shown
in the alignment) are proposed to play specific roles in Ran’s
C-terminal, basic patch, and nucleotide exchange mechanisms.
Our ability to accurately characterize evolutionary constraints strongly depends on the quality of our sequence alignments. Aligning vast numbers of distantly related sequences presents unique algorithmic and statistical challenges because such proteins often only share a minimal structural core with sizable insertions occurring between, and even within, core elements. To address this problem we develop and apply statistical procedures that can accurately align regions of sequence homology while ignoring non-homologous regions and that can obtain measures of alignment uncertainty. These procedures can take advantage of large numbers of available sequences to detect very subtle, yet clearly statistically significant similarities.
Many proteins of interest to us contain subtle structural repeats. For example, DNA clamp loader ATPases couple ATP hydrolysis to loading onto DNA of a protein clamp, which contain structural repeats. Characterizing the functional constraints imposed on these clamps requires that we detect and align these repeats. Hence another focus of our research is to develop and apply statistically-based procedures for this purpose.
Neuwald, A.F. 2007. Gα-Gβγ dissociation may be due to retraction of a buried lysine and disruption of an aromatic cluster by a GTP-sensing Arg-Trp pair. Protein Science 16(11): 2570-2577.
Neuwald, A.F. 2007. The CHAIN program: forging evolutionary links to underlying mechanisms. Trends in Biochemical Sciences 32: 487-493. Review article announcing the availability of the CHAIN program and illustrating how it works.
Kannan, N., N. Haste, S. S. Taylor and A.F. Neuwald. 2007. The hallmark of AGC kinase functional divergence is its C-terminal tail, a cis-acting regulatory module. Proc. Natl. Acad. Sci., USA 104(4):1272-1277.
Neuwald, A.F. 2006. Hypothesis: bacterial clamp loader AAA+ ATPase activation through DNA-dependent repositioning of the catalytic base and of a trans-acting catalytic threonine. Nucleic Acids Research 34(18): 5280-5290.
Neuwald, A.F. 2006. Bayesian shadows of molecular mechanisms cast in the light of evolution. Trends in Biochemical Sciences 31(7): 374-382. Reviews the statistical and scientific basis for CHAIN analysis using as an example eukaryotic DNA clamp loader ATPases.
Kannan, N. and A.F. Neuwald. 2005. Did protein kinase regulatory mechanisms evolve through elaboration of a simple structural component? Journal of Molecular Biology 351: 956-972.
Neuwald, A.F. and J.S. Liu. 2004. Gapped alignment of protein sequence motifs through Monte Carlo optimization of a hidden Markov model. BMC Bioinformatics 5: 157(16 pages).
Ono, T., Losada, A., Hirano, M. Myers, M.P., Neuwald, A.F. and T. Hirano. 2003. Differential contributions of condensin I and condensin II to mitotic chromosome architecture in vertebrate cells. Cell 115: 109-121.
Neuwald, A.F., N. Kannan, A. Poleksic, N. Hata, and J.S. Liu. 2003. Ran’s C-terminal, basic patch and nucleotide exchange mechanisms in light of a canonical structure for Rab, Rho, Ras and Ran GTPases. Genome Research 13(4): 673-692.
Neuwald, A.F. and T. Hirano. 2000. HEAT repeats associated with condensins, cohesins, and other complexes involved in chromosome-related functions. Genome Research 10(10): 1445-1452. This analysis led to the discovery of a new condensin component (see Ono, et al. 2003 above).
Neuwald, A.F. and A. Poleksic. 2000. PSI-BLAST searches using hidden Markov models of structural repeats: Prediction of an unusual sliding DNA clamp and of β-propellers in UV-damaged DNA binding protein. Nucleic Acids Research 28(18): 3570-3580.
Liu, J. S., A.F. Neuwald and C. E. Lawrence. 1999. Markovian structures in biological sequence alignments. Journal of the American Statistical Association 94: 1-15. This publication received the year 2000 Mitchell prize for the best Bayesian application paper.
Neuwald, A.F., L. Aravind, J. L. Spouge and E. V. Koonin. 1999. AAA+: a class of chaperone-like ATPases associated with the assembly, operation and disassembly of protein complexes. Genome Research 9:27-43. Cited over 730 times.
Neuwald, A.F. 1997. Barth syndrome may be due to an acyltransferase deficiency. Current Biology 7: R465-R466. This prediction led to clinical confirmation (see Annals of Neurology 2002 May; 51(5):634-637) and to potential treatments for Barth syndrome, an inherited cardiomyopathic disorder.
Neuwald, A.F. and D. Landsman. 1997. GCN5-related histone N-acetyltransferases belong to a diverse superfamily that includes the yeast SPT10 protein. Trends in Biochemical Sciences 22: 154-155.
Neuwald, A.F., J.S. Liu and C.E. Lawrence. 1995. Gibbs motif sampling: detection of bacterial outer membrane protein repeats. Protein Science 4: 1618-1632. The computer program described in this publication and other programs derived from it (such as AlignACE) have been widely used for analysis of DNA sequence motifs.
ASSET. Implementation of a statistically-based brand and bound algorithm for detectiong conserved motifs in unaligned protein sequences. (Source code available at ftp.ncbi.nih.gov).
Gibbs site sampler. Implementation of a Gibbs sampling procedure for collinear motif detection and alignment. (Source code available at ftp.ncbi.nih.gov).
Gibbs motif sampler. Implementation of a Gibbs sampling procedure for repeat detection and alignment. (Source code available at ftp.ncbi.nih.gov).
PROBE. Implementation of a Bayesian Markov chain Monte Carlo (MCMC) procedure for block based detection and alignment of protein sequences. (Source code available at ftp.ncbi.nih.gov).
GISMO, GARMA, GAMBIT. Implementation of MCMC sampling procedures for gapped, block based alignment of protein sequences. (Executables available at ftp.cshl.org).
The CHAIN program. Implementation of Bayesian methods for characterization of protein functional divergence in atomic detail. (Executables available at www.chain.umaryland.edu).
My research has been funded by the National Institutes of Health, National Library of Medicine (Grant LM06747) and by the National Institutes of Health, Division of General Medicine (Grant GM078541). Bayesian statistical procedures are being developed in collaboration with Drs. Jun S. Liu (Harvard), Wally Gilks (University of Leeds) and Kanti Mardia (University of Leeds)