Assembler Comparisons

Over the past few months, several members of the GRC bioinformatics team have been working diligently on testing a variety of assemblers and analyzing results.  The assembler testing is intended to help critically evaluate the results/performance of some of the more popular de novo assemblers.  Similar studies have been done before (such as: http://gage.cbcb.umd.edu/), but we aim to expand upon those studies by testing on different organisms and data types.  To that end, WGS data generated at IGS, from many samples and across multiple species (such as E. coli, V. cholera, S. aureus and M. massiliense), have been assembled at multiple coverage levels using assemblers such as Celera Assembler, MSRCA, Velvet, SOAPdenovo and ABySS.  In addition, the data has been sequenced using various NGS platforms, including Illumina HiSeq, Illumina MiSeq and PacBio.  These data types will be assembled in different combinations and as stand-alone assemblies to gauge the affects of hybrid assemblies of different data types and combinations.  We hope to have lots of stats compiled in the very near future.

PacBio Upgrade

The PacBio was recently upgraded to version 1.3.3. With this upgrade comes the ability to use the XL versions of the DNA/Polymerase Binding and DNA Sequencing kits. These new kits should result in a longer average readlength (5000 bp) in comparison to the ~3000 bp average we get with the current C2 chemistry.

Using both new kits together does come at a cost. The data produced with the DNA Sequencing Kit XL 1.0 will be of a lower quality than with C2, and is recommended only when the data will be error corrected with shorter, more accurate reads.

For a boost in average read length without sacrificing quality of the reads, the DNA/Polymerase Binding Kit XL 1.0 can be used with the C2 sequencing chemistry rather than with the newer XL sequencing kit.

More details to follow…