Informatics Resource Center - Projects

Project Principal Investigator
Analysis Engine Michelle Gwinn-Giglio PhD
CloVR Owen White PhD
Gemina Lynn Schriml PhD
GSCID Claire M. Fraser PhD
HMP DACC Owen White PhD

Analysis Engine

As sequencing costs continue to drop it has become relatively easy to acquire the genome sequence of prokaryotic organisms. However, there are still few options available for doing systematic, complete annotation of the whole genome using a robust annotation pipeline. Funded by the National Institute of General Medical Sciences, the IGS Analysis Engine provides comprehensive annotation services along with all associated underlying data and tools for manual curation completely free-of-charge.


The Cloud Virtual Resource (CloVR) takes advantage of two technologies, Virtual Machines and Cloud computing, to provide a new community resource for sequence analysis, suitable for large-scale sequencing projects. Bioinformatics sequence analysis still represents a major bottleneck for the application of genomics tools by the larger research community. CloVR aims to enable any researcher with a sequencing machine and an Internet connection to perform complex and computationally demanding sequence analysis and to join the genomic revolution. To achieve this, CloVR takes advantage of Cloud computing platforms that offer on-demand, scalable computing services over the Internet. The CloVR software is available as an Open Source virtual machine that bundles pre-installed and pre-configured bioinformatics tools into automated pipelines. With the CloVR virtual machine, users can run supported analysis on their desktop or laptop computers or utilize Cloud platforms to perform CPU-intensive tasks. CloVR will support common genomics applications for viral, prokyarotic, metagenomic and eukaryotic DNA and RNA sequencing projects. The first version of CloVR is intended to be released by the end of 2009.



The Data Intensive Academic Grid (DIAG) is an NSF-funded shared computational cloud designed to meet the analytical needs of the bioinformatics community. DIAG includes a computational infrastructure, a high-performance storage network, and optimized data sets generated by mining public sequence repositories. DIAG includes 1500 cores for high-throughput computational analysis and 160 cores, connected via a low latency network, for high-performance computing. Complementing this computational capacity, over 400 Terabytes (TB) of shared high-performance parallel storage and 400 TB of local storage are available.

DIAG's cloud infrastructure is built using Nimbus, an open source framework, which transforms a traditional computational cluster into a cloud. The DIAG cloud can be accessed using the popular Amazon EC2 API. The shared storage can be accessed through an S3 compatible interface.

The bioinformatics community can access DIAG as a PaaS using Ergatis, a web based pipeline creation and management tool, or as an IaaS using bioinformatics oriented virtual machines (VMs) such as CloVR, or other custom EC2 compatible Linux VMs. DIAG is also accessible as a traditional computational grid for interactive shell and batch processing, or as an Open Science Grid (OSG) compute element.

DIAG currently supports a number of bioinformatics pipelines and tools such as the IGS Annotation Engine, Virome, CloVR, Galaxy, ISGA, BioLinux, Trinity, and Maker.

DIAG currently has over 100 registered users who conduct large-scale genomics, transcriptomics, and metagenomics data analysis. DIAG is a free resource available to the academic community.



Gemina is a web-based system designed to identify infectious pathogens and their representative genomic sequences through selection of associated epidemiology metadata. Gemina supports the development of DNA signature-based assays for the detection of pathogens or sets of pathogen through the Insignia Signature Pipeline at the University of Maryland.



The Genomic Sequencing Center for Infectious Disease will provide researchers with rapid and cost-efficient production of high-quality genome sequences of NIAID Category A-C priority pathogens, related organisms, clinical isolates, closely related species, and invertebrate vectors of infectious diseases and microorganisms responsible for emerging and re-emerging infectious diseases.



The Human Microbiome Project (HMP) is an international research initiative that will lay the foundation for future studies of human-associated microbial communities in health and disease. The HMP applies the techniques of metagenomics - the study of complex microbial communities using sequencing - to the analysis of microbial community structure and function. The HMP has outlined an ambitious set of goals that will ultimately answer such questions as whether or not humans share a core microbiome and how our microbial communities change over time in response to aging, disease, medications, lifestyle and other interventions.