Next Generation Sequencing (NGS)
Deoxyribonucleic acid, commonly known as DNA, contains the blueprints of life. Within its structures are the codes required for the assembly of proteins and non-coding RNA – these molecular machineries affect all the biological systems that create and maintain life. By understanding the sequence of DNA, researchers have been able to elucidate the structure and function of proteins as well as RNA and have gained an understanding of the underlying causes of disease. Next Generation Sequencing (NGS) is a powerful platform that has enabled the sequencing of thousands to millions of DNA molecules simultaneously. This powerful tool is revolutionizing fields such as personalized medicine, genetic diseases, and clinical diagnostics by offering a high throughput option with the capability to sequence multiple individuals at the same time.
Sanger Sequencing utilizes a high fidelity DNA-dependent polymerase to generate a complimentary copy to a single stranded DNA template (1) (2) (3). In each reaction a single primer, complementary to the template, initiates a DNA synthesis reaction from its 3’ end. Deoxynucleotides or nucleotides, which are the monomers of DNA, are added one after the other in a template-dependent manner forming phospho-diester bonds between the 3’ hydroxyl of the growing end of the primer and the 5’ tri-phosphate group of the incoming nucleotide (Figure 1)(1).
Each reaction also contains a mixture of four di-deoxynucleotides, one for each DNA base (i.e. A, G, T, and C). These di-deoxynucleotides resemble the DNA monomers enough to allow incorporation into the growing strand, however, they differ from natural deoxynucleotides in two ways: 1) they lack a 3’ hydroxyl group which is required for further DNA extension resulting in chain termination once incorporated in the DNA molecule, and 2) each di-deoxynucleotide has a unique fluorescent dye attached to it allowing for automatic detection of the DNA sequence (3) (4) (5).
Many copies of different-length DNA fragments are generated in each reaction, terminated at all of the nucleotide positions of the template molecule by one of the di-deoxynucleotides (Figure 1). The reaction mixtures are loaded on the sequencing machine, either manually onto slab gels or automatically with capillaries, and are electrophoresed to separate the DNA molecules by size. The DNA sequence is read through the fluorescent emission of the di-deoxynucleotide as it flows through the gel (Figure 2) (5). Modern day Sanger Sequencing instruments use capillary based automated electrophoresis, which typically analyzes 8–96 sequencing reactions simultaneously.
Next Generation Sequencing systems have been introduced in the past decade that allow for massively parallel sequencing reactions. These systems are capable of analyzing millions or even billions of sequencing reactions at the same time. Although different machines have been developed with various differing technical details, they all share some common features which are outlined below (Figure 2):
1. Sample Preparation:
All Next Generation Sequencing platforms require a library obtained either by amplification or ligation with custom adapter sequences. These adapter sequences allow for library hybridization to the sequencing chips and provide a universal priming site for sequencing primers. learn more about sample preperation from our Next Generation Sequencing - Experimental Design knowledge base.
2. Sequencing machines:
Each library fragment is amplified on a solid surface (either beads or a flat silicon derived surface) with covalently attached DNA linkers that hybridize the library adapters. This amplification creates clusters of DNA, each originating from a single library fragment; each cluster will act as an individual sequencing reaction.
The sequence of each cluster is optically read (either through the generation of light or fluorescent signal) from repeated cycles of nucleotide incorporation. Each machine has its own unique cycling condition; for example, the Illumina system uses repeated cycles of incorporation of reversibly fluorescent and terminated nucleotides followed by signal acquisition and removal of the fluorescent and terminator groups.
abm offers a wide range of sequencing services on the advanced Illumina® sequencing platforms(MiSeq and NextSeq500), to see a complete list of service offerings, click here.
3. Data output:
Each machine provides the raw data at the end of the sequencing run. This raw data is a collection of DNA sequences that were generated at each cluster. This data could be further analysed to provide more meaningful results.
abm delivers sequencing results in industry standard FASTQ format. For other bioinformatics analyses (i.e. BWA, GATK, Picard), please contact us at email@example.com for a quote.
The differences between the different Next Generation Sequencing platforms lie mainly in the technical details of the sequencing reaction. Below we describe these technical differences briefly. For a full explanation, please visit the manufacturers’ webpages at the links provided in each section.
In pyrosequencing, the sequencing reaction is monitored through the release of the pyrophosphate during nucleotide incorporation. A single nucleotide is added to the sequencing chip which will lead to its incorporation in a template dependent manner. This incorporation will result in the release of pyrophosphate which is used in a series of chemical reactions resulting in the generation of light. Light emission is detected by a camera which records the appropriate sequence of the cluster. Any unincorporated bases are degraded by apyrase before the addition of the next nucleotide. This cycle continues until the sequencing reaction is complete (Table 1).
High reagent cost, and high error rate over strings of 6 or more single base nucleotides.
Table 1 — Technical details for all available pyrosequencing based NGS machines.
|GS Junior||GS Junior Plus||GS FLX+ System|
|GS FLX Titanium XL+||GS FLX Titanium XLR70|
(up to 1,000bp)
(up to 600bp)
|Reads per Run||100,000 Shotgun,
|1,000,000 shotgun||1,000,000 shotgun,
|Accurarcy||99% at 400bp||99% at 700bp||99.997%||99.995%|
|Run Time||10 hr||18 hr||23 hr||10 hr|
For more information, please visit the Roche/454 Life Science website.
Sequencing by Synthesis
Sequencing by synthesis utilizes the step-by-step incorporation of reversibly fluorescent and terminated nucleotides for DNA sequencing and is used by the Illumina NGS platforms. The nucleotides used in this method have been modified in two ways: 1) each nucleotide is reversibly attached to a single fluorescent molecule with unique emission wavelengths, and 2) each nucleotide is also reversibly terminated ensuring that only a single nucleotide will be incorporated per cycle. All four nucleotides are added to the sequencing chip and after nucleotide incorporation the remaining DNA bases are washed away. The fluorescent signal is read at each cluster and recorded; both the fluorescent molecule and the terminator group are then cleaved and washed away. This process is repeated until the sequencing reaction is complete. This system is able to overcome the disadvantages of the pyrosequencing system by only incorporating a single nucleotide at a time (Table 2).
As the sequencing reaction proceeds, the error rate of the machine also increases. This is due to incomplete removal of the fluorescent signal which leads to higher background noise levels.
Table 2 — Technical details for all available sequencing by synthesis based NGS machines.
|MiSeq||NextSeq 500||HiSeq 2500||HiSeq 3000||HiSeq 4000|
|Run Mode||N/A||Mid-Output||High-Output||Rapid Run||High-Output||N/A||N/A|
|Flow Cells Per Rnu||1||1||1||1 or 2||1 or 2||1||1 or 2|
|Output Range||0.3-15 Gb||20-39 Gb||30-120 Gb||10-300 Gb||50-1000 Gb||125-750 Gb||125-1500 Gb|
|Run Time||5-55 hrs||15-26 hrs||12-30 hrs||7-60 hrs||<1-6 days||<1-3.5 days||<1-3.5 days|
|Reads per Flow Cell||25million||130 million||400 million||300 million||2 billion||2.5 billion||2.5 billion|
|Maximum Read Length||2 x 300bp||2 x 150bp||2 x 150bp||2 x 250bp||2 x 125bp||2 x 150bp||2 x 150bp|
For more information, please visit the Illumina website.
abm offers a wide range of sequencing services on the advanced Illumina® sequencing platforms (MiSeq and NextSeq500), to see a complete list of service offerings, click here.
Sequencing by Ligation
Sequencing by ligation is different from the other two methods since it does not utilize a DNA polymerase to incorporate nucleotides. Instead, it relies on short oligonucleotide probes that are ligated to one another. These oligonucleotides consist of 8 bases (from 3’-5’): two probe specific bases (there are a total of 16 8-mer probes which all differ at these two base positions) and six degenerate bases; one of four fluorescent dyes are attached at the 5’ end of the probe. The sequencing reaction commences by binding of the primer to the adapter sequence and then hybridization of the appropriate probe. This hybridization of the probe is guided by the two probe specific bases and upon annealing, is ligated to the primer sequence through a DNA ligase. Unbound oligonucleotides are washed away, the signal is detected and recorded, the fluorescent signal is cleaved (the last 3 bases), and then the next cycle commences. After approximately 7 cycles of ligation the DNA strand is denatured and another sequencing primer, offset by one base from the previous primer, is used to repeat these steps - in total 5 sequencing primers are used (Table 3).
This method leads to very short sequencing reads.
Table 3 — Technical details for all available sequencing by ligation based NGS machines:
|Genetic Analyzer V2.0|
|5500W System||5500xl W System|
|1 x 50||80 Gb||160 Gb|
|1 x 75||120 Gb||240 Gb|
|2 x 50 MP||160 Gb||320 Gb|
|50 x 50 PE||160 Gb||320 Gb|
|Run Time||7 days||7 days|
For more information, please visit the Applied Biosystems website.
Ion Semiconductor Sequencing
Ion semiconductor sequencing utilizes the release of hydrogen ions during the sequencing reaction to detect the sequence of a cluster. Each cluster is located directly above a semiconductor transistor which is capable of detecting changes in the pH of the solution. During nucleotide incorporation, a single H+ is released into the solution and it is detected by the semiconductor. The sequencing reaction itself proceeds similarly to pyrosequencing but at a fraction of the cost (Table 4).
High error rate over homopolymeric stretches of nucleotides.
Table 4 — Technical details for all available ion semiconductor sequencing based NGS machines:
|Ion Proton System|
|Output||up to 10 Gb|
|Reads||60-80 million Reads|
|Read Length||up to 200bp|
|Run time||2-4 hrs|
For more information, please visit the Life Technologies website.
It is difficult to see the differences between the different NGS instruments based on the above data. In this section we attempt to simplify comparisons between instruments by seeing how each system performs if given the task to sequence either the human (3,300,000,000 bases), mouse (2,800,000,000 bases), Arabidopsis thaliana (135,000,000 bases), and E. coli (4,639,221 bases) genomes (Table 5). To be able to use the sequencing data, coverage of at least 30x is required, anything lower than this number is marked in red and anything higher is marked in green.
Table 5 — Coverage of genome per run
|Roche||GS Junior||GS Junior Plus||GS FLX+ System|
|GS FLX Titanium XL+||GS FLX Titanium XLR70|
|Illumina||MiSeq||NextSeq 500||HiSeq 2500||HiSeq 3000||HiSeq 4000|
|Applied Biosystems||Genetic Analyzer V2.0|
|5500W System||5500xl W System|
|Ion Proton System|
Next Generation Sequencing is a young field, with the first machines marketed in 2005. However, in less than a decade NGS has become a cornerstone of molecular biology and genetics. As such, being familiar with its technical terms will help in better understanding the available literature and becoming a member of its ever expanding community. In this section the most common terms used in this field are explained:
Next Generation Sequencing, or NGS, is a sequencing method where millions of sequencing reactions are carried out in parallel, increasing the sequencing throughput.
The output of an NGS sequencing reaction. A read is a single uninterrupted series of nucleotides representing the sequence of the template.
Read Length: The length of each sequencing read. This variable is always represented as an average read length since individual reads have varying lengths.
The number of times a particular nucleotide is sequenced. Due to the error -prone sequencing reactions, random errors could occur. Therefore, 30x coverage is typically required to ensure each nucleotide sequence is accurate.
Sequencing where the coverage is greater than 30x. This is used in cases where dealing with rare polymorphisms which only a subset of the sample expresses the mutation. This method increases range, complexity, sensitivity, and accuracy of the result.
Sequencing from both ends of a fragment while keeping track of the paired data. With this method the sequencing reaction will commence from one end of the fragment. Once completed, the fragment is denatured and a sequencing primer is hybridized to the reverse side adapter. The fragment is then sequenced again. Using this method will allow either further confirmation of the accuracy of the sequence or it could be used to increase the overall read length.
A sample preparation step where large DNA fragments (~10kb) are circularized with an adapter sequence followed by degradation of the circular DNA. This method links DNA fragments that are separated from each other by a certain distance and it is used in applications such as de novo assembly, structural variant detection, and identification of complex genomic rearrangements.
Unique sequences used to cap the ends of a fragmented DNA. The adapter’s functions are as follows: 1) allow hybridization to solid surface; 2) provide priming location for both amplification and sequencing primers; and 3) provide barcoding for multiplexing different samples in the same run.
A collection of DNA fragments with adapters ligated to each end. Library preparation is required before a sequencing run. Our next knowledge base will delve into the different sample and library preparation methods available.
Mapping a sequence read to a known reference genome
A fully sequenced and mapped genome used for the mapping of sequence reads.
Assembly of the sequence reads to generate a reference sequence.
The percentage of sequences that map to the intended targets out of total bases per run.
The variability in sequence coverage across target regions. When performing whole genome sequencing or exome sequencing, it is expected that the result will be highly uniform (as there should be a 1:1 ratio in the starting material). However, RNA sequencing will not be uniform since differences in expression alter its starting material.
- Sequences, sequences, and sequences. Sanger, F. s.l. : Annu Rev Biochem, 1988, Vol. 57, pp. 1-28.
- Nucleotide sequence of bacteriophage phi X174 DNA. Sanger, F, Air, GM and Barrell, BG. 1977, Nature, Vol. 265, pp. 687-695.
- DNA Sequencing with chain-terminating inhibitors. Snager, F, Nicklen, S and Coulson, AR. s.l. : Proc NatI Acad Sci USA, Vol. 74, pp. 5463-5467.
- Overview of DNA sequencing strategies. Shendure, JA, Porreca, GJ and Church, GM. Chapter 7, s.l. : John Wiley & Sons, 2011.
- Energy transfer primers: a new fluoresence labeling paradigm for DNA sequencing and analysis. Ju, J, Glazer, AN and Mathies, RA. 2, s.l. : Nat Med, 1996, pp. 998-999.
- 454 Sequencing. [Online] 2015. [Cited: 6 2, 2015.] http://www.454.com/.
- illumina. [Online] 2015. [Cited: 6 2, 2015.] http://www.illumina.com/.
- SOLiD. Applied Biosystems. [Online] 2015. [Cited: 6 2, 2015.] http://www.appliedbiosystems.com/absite/us/en/home/applications-technologies/solid-next-generation-sequencing.html.
- Ion Torrent. Applied Biosystems. [Online] 2015. [Cited: 6 2, 2015.] http://www.lifetechnologies.com/ca/en/home/brands/ion-torrent.html.