PacBio Pipeline Off to a Strong Start for 2014

It has been a busy January for our PacBio RSII instrument. We are excited to report a new record yield from a single SMRT cell – 896,457,524 passed filter bases! It seems we are not far off from hitting 1 G.

 

Some more stats from this cell:

                Mean Read Length: 8391 bp

                P50 Subread Length: 6187 bp

                P90 Subread Length: 12314 bp

                P95 Subread Length: 14032 bp

                Maximum Subread Length: 24585 bp

 

We have come a long way in the past year. Here is a comparison of yields and mean read lengths of our top 20 SMRT cells in January 2013, compared to our top 20 SMRT cells so far in 2014:

The increases in both SMRT cell yields and read lengths are making PacBio an attractive option for sequencing and finishing microbial genomes. We are excited to see where 2014 will take us!

For more information on our full range of sequencing and analysis services, visit our Laboratory Services and Analysis Services pages. Please contact us if you have any questions.

Finishing Genomes with the PacBio RS II – Read our Core Lab Profile

The GRC, which offers services from sequencing library prep through genome assembly and downstream analysis, is generating complete bacterial genome sequences and methylation profiles using PacBio SMRT sequencing on the RS II. Several advancements in the library prep, sequencer, sequencing protocols, and data analysis software have all contributed to this.

To learn more about these breakthroughs and other emerging applications of SMRT sequencing, please read the PacBio Core Lab Profile showcasing the research performed at GRC and IGS here.

GRC and IGS offer not only cutting-edge sequencing, but a complete menu of services including assembly, annotation, and custom analyses. For more information about services offered, visit our Laboratory Services and Analysis Services pages. Please contact us if you have any questions.

 

Increasing PacBio RS II SubRead Lengths

Although the latest SMRTcell has been designed to shift the loading bias towards larger read lengths, when working with long insert libraries (10-20 kb), the preferential loading of smaller fragments often limits the potential of these libraries.

A solution to this is to remove small fragments from the libraries. We have evaluated the Blue Pippin (Sage Science, Inc., Beverly MA), an automated electrophoresis system that separates and collects DNA fragments based upon their size, for this purpose.

In order to measure the increase in subread length, long insert libraries were prepared with fragments larger than 4 kb or 7 kb isolated using the Blue Pippin and a 0.75% Agarose Gel Cassette (BLF7510) and compared to a library without Blue Pippin size selection. As shown below, the removal of smaller library fragments prior to sequencing increases the average length of the library fragments loaded into ZMWs on the SMRTcell.

In addition to longer subreads, there is also a boost to the amount of data generated per ZMW. As the fragment length increases, the percentage of SMRTbell adapter sequence decreases and the percentage of library insert increases. The graph below shows the average number of passed-filter bases per active ZMW versus the average fragment length of each library. Using Blue Pippin size selection, we have achieved yields of >500 M passed filter bases from individual SMRTcells.

Below are the sequencing and assembly results of four genomes sequenced from long-insert, Blue Pippin size-selected libraries. Using only PacBio long subread data, we were able to assemble complete microbial genomes for three of the four isolates. Even with only a single under-loaded and low-yield SMRTcell, the remaining isolate still resulted in a nearly complete genome assembly with 10 total contigs and >60% of the genome assembled in the largest contig.


Complete microbial genomes using only PacBio data? Testing HGAP…

We’ve spent some time recently testing a new way to assemble PacBio data called HGAP, which stands for “hierarchical genome assembly process”.  Unlike previous assemblers of PacBio data that have relied on the use of either Illumina and/or PacBio CCS reads for error correction of PacBio long reads, HGAP uses multiple alignments of all reads to perform the corrections, potentially eliminating the need for other libraries and data types.  The corrected reads are assembled with an overlap-layout consensus assembler (in this case Celera Assembler) to form contigs.  More details about HGAP can be read found here: https://github.com/PacificBiosciences/DevNet/wiki/Hierarchical-Genome-Assembly-Process-%28HGAP%29

We have evaluated HGAP on several of our projects and compared it to our assembly of illumina-corrected Pacbio reads assembled with Celera Assembler.  So far, the results have been very encouraging and we have seen significant improvement in many cases.  The chart below shows several examples:

So the assemblies are more contiguous, but are the corrections good enough to generate accurate consensus sequence? In an attempt to verify the consensus accuracy of these HGAP assemblies for several Bordetella genomes, we aligned >240x coverage of 250bp Illumina MiSeq data to the HGAP-generated contigs and looked for discrepancies and SNPs using GATK. We found no cases of high-quality, passed-filter variants, which supports a highly accurate consensus sequence generated by the HGAP assembly.  We continue to test and compare HGAP with other PacBio assembly methods but are encouraged by initial results.

The PacBio ‘Stage Start’ Feature

A new feature that was added with the recent PacBio upgrade is something called ‘Stage Start’. This allows for data collection to start earlier than it did previously. When this option is used, data collection begins immediately after the polymerase is activated, resulting in longer reads.

Below are the results from a quick test we performed. We sequenced two libraries with and without the ‘Stage Start’ feature turned on.

The libraries sequenced were about 8kb in length, and were sequenced using the Magbead Standard Seq v1 protocol. One 90-minute movie was taken of each SMRTcell. Standard Polymerase Binding and Sequencing kits were used (not the newer ‘XL’ version of the kits).

PacBio Upgrade

The PacBio was recently upgraded to version 1.3.3. With this upgrade comes the ability to use the XL versions of the DNA/Polymerase Binding and DNA Sequencing kits. These new kits should result in a longer average readlength (5000 bp) in comparison to the ~3000 bp average we get with the current C2 chemistry.

Using both new kits together does come at a cost. The data produced with the DNA Sequencing Kit XL 1.0 will be of a lower quality than with C2, and is recommended only when the data will be error corrected with shorter, more accurate reads.

For a boost in average read length without sacrificing quality of the reads, the DNA/Polymerase Binding Kit XL 1.0 can be used with the C2 sequencing chemistry rather than with the newer XL sequencing kit.

More details to follow…