Services - Analysis Services

Each analysis project is performed in close consultation with the researcher

Using teams of experienced computational biologists, software engineers, bioinformatics analysts, and biologists, we provide a range of genomic and metagenomic data assembly and analysis services using state-of-the-art software pipelines and the IGS computational infrastructure. We have deployed a hierarchical data storage system and high-capacity compute grid to support large data analysis projects. Our analysis services can be used alone or in combination with our laboratory services to provide comprehensive data generation and analysis for your project. Each analysis project is performed in close consultation with the researcher to ensure that the analysis addresses the scientific question(s) at hand. Services are provided on a fee-for-service basis and each project is customized to the needs of the investigator.

For more information or to discuss your research needs, please request a consultation.

Assembly & Annotation

De novo assembly: Using a range of sequence assembly tools tuned to the appropriate genome size and characteristics, we produce an optimized assembly and deliver contigs, scaffolds, and metrics in a variety of standard formats.

Genome Annotation: We provide annotation of both prokaryotic and eukaryotic genome sequences. This includes gene finding, searches of predicted proteins against various sequence-based resources (e.g. UniRef100 and Pfam), and automated annotation of proteins based on an evidence hierarchy. Visualization tools such as Manatee and WebApollo can be employed on the output.

Comparative Genomics

Comparative Assembly: Given a reference sequence, data is aligned and assembled against the reference. Resulting contigs, scaffolds, and variant records are delivered.

Prokaryotic Comparative Genomics: We have two available pipelines for comparing prokaryotic genomes. The protein cluster-based pipeline uses Jaccard filtered bi-directional best blast matches to produce ortholog clusters (Crabtree, et. al., PMID:18314579). It has been successfully used for the comparison of 100 (or more) genomes at one time. The DNA alignment-based pipeline employs the Mugsy whole genome alignment algorithm (Angiuoli, et. al., PMID:21148543). Mugsy is a reference-independent tool that builds protein ortholog groups based on whole genome multiple alignments and synteny thus helping to differentiate between paralogs and orthologs. This method is optimized for comparing closely related organisms. For both pipelines, the web-based visualization tool Sybil is used to search and view ortholog clusters, genomic context, synteny, and more.

Variant Analysis

Using analysis pipelines developed specifically for variant detection, sequence data is aligned to available reference sequences. SNPs, indels (insertions and deletions), and structural variants are detected, quality-filtered, and annotated (coding, non-coding, synonymous, non-synonymous, etc.). In order to identify novel or rare variants, the variants are compared to an in-house database of known variants that includes the latest dbSNP and 1000Genomes data as well as other known variants from multiple organisms and publicly available data sets. Data visualization tools are available to browse the results. Comparative analysis of variants calls is also available.

Transcriptome Analysis

Transcriptome (RNA-Seq) data can be analyzed to determine gene or isoform level expression profiles, sequence variation, and differential expression between multiple conditions and/or timepoints. Included in this pipeline is the alignment of reads to a reference genome, expression analysis, differential expression analysis, isoform analysis, and differential isoform analysis. We are also able to do de novo transcriptome assembly. Results are output as spreadsheets containing statistics, differentially expressed genes, isoforms and differentially expressed isoforms as well as pdf plots and figures such as heat maps and principle component analyses. Visualization tools such as the Integrative Genome Browser (IGV) can be used.

Epigenome Analysis

We analyze ChIP-Seq, BS-Seq, ATAC-Seq, PacBio base modification, and other types of epigenomic data. Data from ChIP-Seq experiments are aligned to a reference genome and analyzed for peak enrichment to identify DNA-protein binding sites. Differential peak analysis between experiments can be used to identify binding sites specific to certain conditions or proteins. For BS-Seq or other methylation-based experiments, DNA methylation patterns are detected by aligning sequence data derived from bisulfite-treated DNA to both a reference genome and a version of the reference genome that has been in silico bisulfite converted. This dual alignment analysis enables more accurate identification of methylated sites and their boundaries. PacBio’s unique SMRT Sequencing method enables direct detection of methylation and other DNA modifications by measuring kinetic variation during nucleotide incorporation by the polymerase. Detailed base modification motif reports are generated.

Microbiome Profiling

We process amplicon sequence data from 16S, 18S, ITS, and custom amplicon projects using an in-house informatics workflow that makes use of QIIME and dada2 components. This is offered primarily as a package in combination with our microbiome library and sequencing services but can also be performed on pre-existing data sets. Included in the analysis pipeline are the production of QC reports, taxonomic assignments, and assorted analysis (heatmaps, diversity measures) according to the project needs. Analysis and QC of sequence level control samples (Extraction & PCR positive/negative) is also included where applicable. Sequence data is pre-formatted for simple upload to NCBI Short Read Archive. Additional statistical analysis is offered to accommodate needs for advanced analysis relating to modeling and in-depth analysis. We offer analysis on data generated using Illumina and PacBio platforms.

Microarray Analysis

We can carry out microarray analysis for samples from multiple platforms (Affymetrix Array, Affymetrix ST Array, NanoString) to study protein-coding as well as non-coding (miRNA, lncRNA, regulatory) gene expression profiles. Differentially expressed genes are identified and reported in spreadsheets, heat maps, and other outputs.

Pathway & Network Analysis

Given a set of variant positions, genes, or other loci associated with a particular phenotype, we use software packages, including Ingenuity Pathway Analysis (IPA), developed specifically to analyze gene pathways and networks to find associations with functional profiles, tissue or disease specific biomarkers, and other genes in the same pathways and networks. We also use open-source network and visualization tools DAVID, Cytoscape, and Reactome to complement network/pathway analysis to increase the accuracy and sensitivity of network/biomarker identification.

Cloud Based Pipelines

We are actively porting our analysis pipelines to operate in a cloud environment. Currently we have tools in the cloud to analyze whole metagenome sequence data and to perform prokaryotic genome annotation. Soon, our transcriptome analysis tools will also be cloud-enabled.

Customized Analysis

Our computational infrastructure and expertise enable cutting-edge custom analysis for a broad range of omics applications. We will customize an analysis plan for each project in close consultation with each investigator. Please request a consultation to discuss custom analysis projects.