GRC Posters Presented at AGBT 2014

This year we are highlighting some of the work we’ve done in the past year.

The first poster provides an overview of how changes to our PacBio pipeline have increased our sequencing yields and read lengths, resulting in finished, high-quality microbial genomes, assembled using only PacBio data.

The second poster demonstrates how Next Gen sequencing can be used to investigate host and pathogen associations in cases of pulmonary non-tuberculous mycobacterial (PNTM) infections.

For more information on our full range of sequencing and analysis services, visit our Laboratory Services and Analysis Services pages. Please contact us if you have any questions.

Increasing PacBio RS II SubRead Lengths

Although the latest SMRTcell has been designed to shift the loading bias towards larger read lengths, when working with long insert libraries (10-20 kb), the preferential loading of smaller fragments often limits the potential of these libraries.

A solution to this is to remove small fragments from the libraries. We have evaluated the Blue Pippin (Sage Science, Inc., Beverly MA), an automated electrophoresis system that separates and collects DNA fragments based upon their size, for this purpose.

In order to measure the increase in subread length, long insert libraries were prepared with fragments larger than 4 kb or 7 kb isolated using the Blue Pippin and a 0.75% Agarose Gel Cassette (BLF7510) and compared to a library without Blue Pippin size selection. As shown below, the removal of smaller library fragments prior to sequencing increases the average length of the library fragments loaded into ZMWs on the SMRTcell.

In addition to longer subreads, there is also a boost to the amount of data generated per ZMW. As the fragment length increases, the percentage of SMRTbell adapter sequence decreases and the percentage of library insert increases. The graph below shows the average number of passed-filter bases per active ZMW versus the average fragment length of each library. Using Blue Pippin size selection, we have achieved yields of >500 M passed filter bases from individual SMRTcells.

Below are the sequencing and assembly results of four genomes sequenced from long-insert, Blue Pippin size-selected libraries. Using only PacBio long subread data, we were able to assemble complete microbial genomes for three of the four isolates. Even with only a single under-loaded and low-yield SMRTcell, the remaining isolate still resulted in a nearly complete genome assembly with 10 total contigs and >60% of the genome assembled in the largest contig.


PacBio RSII producing encouraging early results

Our PacBio throughput and read lengths have been improving steadily over the past year and may have just taken yet another big step forward.  We upgraded our PacBio sequencer to RSII in mid-May and we are seeing significant increases in per-cell yield and improved read lengths with our longer libraries.  The most notable change in the upgrade from RSI to RSII is the doubling of the number of simultaneously observable sequencing reactions on the SMRTcell, allowing throughput to be effectively doubled as well.  Let’s take a look at some examples:

In this comparison of an 8kb Mycobacterium library that was run both before and after the upgrade, we see an almost 3x increase in total yield per-SMRTcell, while read lengths remain about the same.

Below is a comparison of per-SMRTcell stats from multiple libraries across multiple organisms, including both 8kb and 14kb libraries from Mycobacterium sp., Plasmodium falciparum, Saccharomyces cerevisiae and Candida albicans.  Driven by the longer libraries, we see both dramatically higher yield and longer read lengths. On one recent 8 SMRTcell run of a 14kb library, we saw an average per-SMRTcell yield of 417 Mbp!

Here is a read length plot comparing the runs from the table above:

 Although we are early in our use and optimization of the new PacBio RSII, we are encouraged by the increase in both yield and read length, and expect continued improvement in our PacBio data, subsequently improving data analysis and genome assembly.