With a great flourish in July 2000, the human genome sequence was judged complete. The effort was undertaken by both a private company (Celera Genetics) and a government-led consortium. However, the human genome sequencewas not complete. It was essentially, a rough draft. The sequencing completed in 2000 was honeycombed with long areas of DNA that were indistinct or missing. Even as Celera dropped out of the effort, the public consortium continued (Human Genome Project).
In 2003, the human genome project was again declared complete. https://www.genome.gov/human-genome-project Yet even the new, updated draft was missing 8% of the genome. The existing technology at that time just could not read the most difficult areas to sequence. The centromere, which is particularly dense in repeat segments of DNA, is quite difficult to read (Repeats). Furthermore, there are 5 of the 23 chromosomes where the centromere is eccentrically placed, leaving a long arm and a very short arm. These short arms are also very dense in DNA repeats and difficult to sequence.
The segments which are rich in repeats usually do not contain genes and have until recently been generally neglected. Discovered on prior attempts to map the human genome, only 1% of human DNA encodes proteins! Part of the problem is that current sequencing technology shreds the DNA into small segments. The repeating segments all look nearly identical, making it difficult to properly sequence. Now, there a two new “long-read” sequencing technologies which allow researchers to read longer segments of the genome (PacBio Hi Fi and Oxford Nanopore), making it easier to sequence both the short arms and the centromere. With this new technology, a new consortium, called Telomere-to-Telomere (T2T) was formed to complete the genome. Finally in May 2021, without pomp, the group completed the sequencing with an online preprint (Nurk et.al.).
The role of repeat segments in, the centromere, short arms of chromatids and other areas remain a mystery. The centromere is thought to be important in cell division. During DNA separation, a protein complex spindle, essential to proper chromosome separation, attaches to each centromere. The spindle pulls the DNA pair apart assuring that each cell gets an equal share. Examples of defective chromatid separation include Down, Turner, and Klinefelter syndromes. Scrutinizing the centromere sequences may yield clues to other chromosomal abnormalities. Unfortunately, chromosomal abnormalities are a telltale sign of aging. As an example, it is common for men to lose their Y-chromosome n their white cells. Ugh!
The short arms of the 5 chromosomes with eccentrically placed centromeres are similarly mysterious. They are also rich in repeat segments. Many scientists believe that these DNA segments play some role in the translation of genes into proteins. But truthfully, they do not actually know.
One of the interesting aspects of this human genome is that the sequencing comprises only a single set of 23 chromosomes as opposed to 23 pair. The researchers used cells from a particular type of tumor that develops from an abnormal egg ending with just 23 single chromosomes. Sequencing all 23 pair is called a diploid genome. The diploid human genome has not been completed.
As newer and newer technology comes on board, truly complete genome sequencing will be more straight forward and more rapid. “Soon, another complete human genome will not be news at all.”