DNA makes up an organism’s genome, or genetic material (bacteria, virus, potato, human). Each organism has its own DNA sequence, which is made up of bases (A, T, C, and G). You have identified an organism’s unique DNA fingerprint, or pattern, if you know the sequence of its bases. Sequencing is the process of determining the order of bases. Whole genome sequencing is a laboratory procedure that determines the order of bases in an organism’s genome in a single process.
The Telomere-to-Telomere (T2T) Consortium claims to have sequenced the entire human genome back-to-back, including all of the missing segments from the famous 2001 reference human genome and the most recent 2013 draft. The new findings, published in a pre-print paper to BioRxiv (meaning it has yet to be peer-reviewed), claim to have discovered the missing 8% of human DNA and, if confirmed, will be the first complete sequence of the human genome ever created.
It has not been an easy road to get here. The entire human genome is 3.055 billion base pairs long – that’s 3.055 billion individual letters that must be identified, placed in the correct region while avoiding overlapping sections, and stitched together into one very long string.
Scientists claim to have finally sequenced the entire human genome, back-to-back and including all the segments that were missing from the famous 2001 reference human genome and the most recent 2013 draft.
The human genome contains approximately 3 billion of these base pairs, which are found in the nucleus of all our cells’ 23 pairs of chromosomes. Each chromosome contains hundreds to thousands of genes that contain the instructions for producing proteins. Each of the estimated 30,000 genes in the human genome produces three proteins on average.
When the Human Genome Sequencing Consortium published their first drafts of the human genome in 2001, they paved the way for almost every aspect of human genetics available today. Since 2013, the most recent draft of the human genome has been used as a reference. But weighed down by impractical sequencing techniques, these drafts left out the most complex regions of our DNA.
This is due to the fact that these sequences are highly repetitive and contain many duplicated regions – attempting to put it together in the right places is analogous to attempting to complete a jigsaw puzzle in which all of the pieces are the same shape and have no image on the front. Long gaps and underrepresentation of large, repeating sequences resulted in the exclusion of 8% of the genetic material. To illuminate the darkest corners of the genome, scientists needed to develop more accurate methods of sequencing.
As a result, the researchers turned to two more modern and precise sequencing techniques: Oxford Nanopore and PacBio HiFi ultra-long read sequencing. HiFi sequencing allows for the simultaneous sequencing of long sections of DNA while maintaining the accuracy normally reserved for short-read sequencing. Meanwhile, Oxford Nanopore is a technique in which single strands of bases are pushed through a tiny pore, and changes in the electrical current indicate which base is currently passing through. Both techniques complement each other, and combining them allowed the researchers to finally figure out what was lurking in the enigmatic 8%.
Within the missing chunk of the genome, the T2T Consortium discovered 200 million new base pairs. There were 2,226 genes in total, with 115 of them expected to code for a protein. What are the functions of these protein-coding genes? They aren’t sure yet; that will be determined by future research. For the time being, and assuming it is peer-reviewed, this is one of the most significant updates to the human reference genome since it was released nearly two decades ago.
It is important to note that, while this appears to be the entire human genome, the team estimates that approximately 0.3 percent of it is incorrect. It also does not contain a complete map of all human chromosomes. The cells used to create T2T-CHM13 (the name of the reference genome created in this study) only had 23 chromosomes, rather than the full 46 that most human cells have. Because of this, the Y chromosome was not included, and the researchers are now working hard to sequence and add the remaining chromosomes.
Using new technology, the sequencing of the new genome fills in these gaps. However, it has several limitations, including the type of cell line used by the researchers to expedite their work.