Helping to unravel the mysteries of biological systems

Pipeline Services

Annotation Engine
As sequencing costs continue to drop it has become relatively easy to acquire the genome sequence of prokaryotic organisms. However, there are still few options available for doing systematic, complete annotation of the whole genome using a robust annotation pipeline. Funded by the National Institute of General Medical Sciences, the IGS Annotation Engine provides comprehensive annotation services along with all associated underlying data and tools for manual curation completely free-of-charge.

The Annotation Engine provides the output of our prokaryotic annotation pipeline including gene finding; similarity, motif, and domain searching; and automated functional annotation. The data produced by the pipeline is available to users in two ways: via FTP in a MySQL chado database or online using the Manatee curation tool with a password-protected account.

A second focus of the Annotation Engine is to provide educational resources. The Annotation Engine is a valuable tool for the classroom setting. It provides students with an opportunity to do hands-on, real-world annotation while at the same time gaining in-depth understanding of the elements of the annotation process. The Annotation Engine has been used in two different university settings so far and we continue to welcome new collaborations in this area. In addition, we offer the free IGS Genomics Workshop, a 3-day course held here at IGS on the University of Maryland campus.
More...


GEMINA
Genomic Metadata for Infectious Agents, is an open source web-based pathogen centric tool designed to provide targeted DNA Signature selection of the NIAID category A-C viral and bacterial pathogens. A representative genomic sequence is identified for each pathogen by the Gemina system and utilized for the Insignia DNA Signature pipeline.

The Gemina system describes the Who [Host], What [Disease, Symptom], When [Date], Where [Location] and How [Pathogen, Environmental Source, Reservoir, Transmission Method] of infectious pathogens.

The Gemina system provides an integrated investigative and geospatial surveillance system connecting pathogens, pathogen products and disease anchored on the taxonomic ID of the pathogen and host, linking for the first time unique genomic representations of each pathogen with ontology regularized metadata for the associated epidemiological information. The Gemina system has been developed with a straightforward text based query interface, a java-based ontology tree viewer interface for deeper exploration of the ontologies, geospatial surveillance functionality to view the progression of pathogens spatially and over the course of time and a selection tool for DNA signatures to provide a set of resources for pathogen surveillance, metadata investigation and DNA diagnostics of the NIAID category A-C bacterial and viral pathogens. The Gemina web interface, provides access to data extracted from PubMed articles for the NIAID category A-C viral and bacterial pathogens through a set of metadata controlled vocabularies for Toxins, Reservoirs, Environmental Sources (EnvO), Geographic Locations (Gaz), Diseases, Anatomy, Transmission Methods, and Symptoms. This strategy allows users to build a multi-term query using one or more metadata types representing the Gemina chain of infection data model.

The Gemina system enables users to explore the diversity of outbreak data reported in literature that has been regularized through a set of mature community-adopted ontologies, for each NIAID category A-C pathogen, to identify the breadth of hosts and diseases known for these pathogens, to identify where these pathogens have been reported to occur in the world and to link to the Insignia Signature Detection tool to identify unique regions within the genomes of these pathogens. In the 06/23/2009 release of Gemina, the database contained 367 bacterial, 21 toxin strains and 10,991 viral strains including Influenza A, B, and C subtypes and strains such as the 2009 Swine Flu H1N1 strains.