Compressed Archive, English
After almost 30 years of inertia in the field of sequencing, the emergence of a whole range of so-called "next-generation" sequencing technologies has revolutionized the way we approach genomic and genetic research. Sequencing all 3 gigabases of a human genome, once a costly task of 13 years of international efforts, can now be done within a matter of days with a coverage of 30x and more, and comes with a price tag that is affordable for a middle-sized lab. Among the different next-generation sequencing machines developed over the course of the last 6 to 8 years, four instruments from three different companies have established themselves on the market for human whole-genome sequencing: Illumina's HiSeq2000, Life Technologies' SOLiD 4 and 5500xl SOLiD, and Complete Genomics' technology.
However, these next-generation sequencing platforms are still relatively new, and a comprehensive comparative assessment of their performance is lacking. For this purpose, the DNA of two tumor-normal pairs from medulloblastoma patients was sequenced individually to 30x coverage on each of the four instruments. The resulting data was analyzed with respect to its coverage distribution and biases over the genome, in particular GC bias, and regions without coverage as well as specific genomic regions were assessed. SNP calls on the different sequencing machines were compared, and the benefits of combining read information from different instruments were evaluated. Additionally, somatic mutations were analyzed.
The most striking result is the poor coverage of GC-rich regions by SOLiD 4 and 5500xl SOLiD, discouraging their use in particular for methylation experiments and exome sequencing. In contrast, Complete Genomics seems the least affected by GC content and shows the most comprehensive coverage of many genomic regions, except for short repeats. HiSeq2000 exhibits the most even genome-wide coverage distribution and the least sample-to-sample variation, while consistently achieving the highest sensitivity in SNP calling. A combination of read data from different technologies is shown to entail limited improvement in most cases, and is advisable only for very specific applications. Finally, the comparison of somatic variation confirms that calling somatic alterations is still a big challenge, which is due in particular to low allele frequency. In summary, this comparative study illustrates the assets and drawbacks of each individual machine and can be used as a guide to find the most suitable platform for a specific experimental goal.
|Supervisor:||Eils, Prof. Dr. Roland|
|Date of thesis defense:||6 November 2013|
|Date Deposited:||08 Nov 2013 09:06|
|Faculties / Institutes:||The Faculty of Bio Sciences > Dean's Office of the Faculty of Bio Sciences|
|Subjects:||004 Data processing Computer science
500 Natural sciences and mathematics
570 Life sciences
|Controlled Keywords:||Bioinformatik, Krebsforschung, Genomik, Genomprojekt|
|Uncontrolled Keywords:||Next-generation sequencing|