Bioinformatics - Tools
IGS has developed a number of tools for bioinformatics analyses that are available to the community as compiled binaries or as source code. Some of the include:
A program for analysis of protein functional divergence and prediction of molecular mechanisms.
Rapid detection, classification and accurate alignment of up to a million or more related protein sequences.
Ergatis is a web-based utility used to create, run, and monitor reusable computational analysis pipelines, utilizing the Workflow engine. It contains pre-built components for common bioinformatics analysis tasks. Ergatis is under active development at IGS and is in use at several sequencing centers including the J Craig Venter Institute (JCVI), and the Broad Institute.
IDEA (Interactive Display for Evolutionary Analyses)
provides a graphical interface for PAML (Phylogenetic Analysis by Maximum
Likelihood), a suite of programs for conducting molecular evolution analyses on
nucleotide and amino-acid data. IDEA allows you to run either of the PAML
programs, codeml or baseml, on one or more datasets simultaneously to obtain
maximum likelihood estimates of numbers of substitutions per branch and per
site and to compare multiple models of molecular evolution.
IDEA runs on Linux, Solaris and Mac OS X operating systems; it is designed to execute processes in parallel on a multiprocessor machine and can run on a computational grid with support for SGE or Condor. IDEA is available free of charge from SourceForge.
Manatee is a web-based tool used to perform manual functional annotation. It has been specifically designed to optimize the ability of curators to evaluate all available sequence-based and experimental data to assign the best possible annotation to a given gene product. Manatee allows users to view, modify, and store annotation through interactions with an underlying relational database where all of the information is stored. Manatee supports the storage of multiple types of functional annotation including protein names, gene symbols, EC numbers, Gene Ontology terms, and associated supporting evidence. In addition, Manatee provides summary views of statistics and information from the genome as a whole.
PhyloTrac is a software package for exploration and analysis of phylogenetic diversity from PhyloChip data. PhyloTrac is capable of displaying data from multiple PhyloChip experiments in a variety of styles, including heatmap, time series/parallel coordinates, probe intensity display, phylogenetic tree, and textual spreadsheets. All views are fully synchronized and dynamic so that selection and filtering in one view is instantaneously reflected in the other views.
Sybil is a web-based tool for visualizing and mining comparative genomic data. Powered by a Chado relational database, Sybil provides a rich set of interfaces for browsing and analyzing data. The tool has been implemented for a variety of organisms both prokaryotes and eukaryotes. Sybil allows users to search for genes or gene clusters of interest and visualize their genomic context. The various displays provide multiple types of genomic comparisons for in-depth data mining, data interrogation from multiple angles, and generation of publication-ready figures. Sybil also gives users the ability to identify core and accessory genes from all or a subset of the available genomes. Most recently a Sybil site has been released to the public for comparison of complete Streptococcus pneumoniae genomes. Strepneumo promises to be an important tool in accelerating vaccine discovery in developing nations.
Sybil is implemented in Perl and built on a tiered architecture that includes an API for retrieving data from Chado. The software also includes utilities for rendering publication quality images in SVG and PDF formats. Sybil is open source and freely available with documentation and demo databases available for download.
Workflow is a Java based, XML driven Workflow Engine suite, which can be used to build, execute and monitor complex process pipelines. This tools serves as the execution engine for the Ergatis tool. Workflow is under active development at IGS.
CloVR is a desktop application that integrates
state-of-the-art genomic tools in a robust, user friendly, and fully automated
software package with optional support for cloud computing platforms. CloVR
currently bundles push-button pipelines for microbial genomics, including
16S rRNA sequence analysis,
and metagenomic sequencing projects.
Additional pipelines for comparative genome analysis,
prokaryotic and eukaryotic RNA-sequencing and viral genomics are planned.
CloVR is distributed as a portable virtual machine that is launched on a desktop or laptop under VMware or Virtualbox. Optionally, CloVR automatically manages additional resources on the cloud to perform large-scale sequence analysis. CloVR supports multiple cloud providers, including Amazon EC2, and academic clouds for research.
The Phylomark tool utilizes a whole genome alignment and identifies the minimum number of smaller regions that have a significant phylogenetic signal and can recapitulate the whole genome phylogeny. The use of the tools would be in the screening of large culture collections to focus whole genome sequencing efforts to identify new branches of the tree or fill out regions that are not well represented based on a whole genome approach.