Scientists have mapped the human genome for a quarter century, yet millions of DNA letters are still unresolved. A new study now reports the most complete reference to date, covering those difficult sections.
The international team sequenced 65 ancestry diverse genomes and closed 92 percent of the gaps left by earlier projects, providing a resource that lets clinicians zoom all the way into regions long considered unreadable.
The project was led by Christine Beck of The Jackson Laboratory and the University of Connecticut Health Center (UConnHC).
Beck noted that the missing pieces often carry variants influencing digestion, immunity, and muscle control. Without them, risk models for many conditions have been blind to entire classes of DNA changes.
Clinicians felt that blindness whenever genetic tests ruled out mutations yet patients still developed disease. The new assemblies bring those stretches into view, letting variant calling software finally flag complex rearrangements that older methods skipped.
The most important areas that were finally decoded include a stretch of DNA related to spinal muscular atrophy, a serious genetic disease. Another key region is the major histocompatibility complex, a crowded section tied to over 100 different health conditions.
The team used a newer type of sequencing that reads much longer pieces of DNA than older methods. They combined two types of reads: one that’s very accurate and another that’s extra long, so they could capture big and tricky sections.
The experts stitched those reads together to build complete sets of DNA from each person, including both their mother’s and father’s versions.
This approach allowed the researchers to finish some chromosomes from end to end in about 4 out of every 10 cases, which is a major improvement over earlier attempts. They also shared their method so that other scientists can now do the same without having to start from scratch.
The researchers uncovered nearly 2,000 complicated DNA changes that were too hard to find before. They also identified over 12,000 pieces of jumping DNA, which are bits that can move around and change how genes work.
On top of that, the team fully mapped more than 1,200 centromeres, which are the central parts of chromosomes that help them divide properly. Many of these turned out to have two possible “connection points” instead of one, something that may change how scientists understand genetic stability.
The researchers saw up to 30 fold differences in α satellite repeat length at centromere cores, variation that could affect fertility or cancer risk once paired with clinical records. Those comparisons were impossible before the gaps were filled.
The experts also mapped the notoriously variable amylase gene cluster, which influences how well people digest starch. Such detailed mapping lets anthropologists match genes with regional culinary traditions.
Earlier references were built mostly from European genomes, a limitation that has skewed risk scores and drug studies for decades. Clinicians in Africa, South America, and Asia have repeatedly reported mismatches between test results and patient outcomes.
Nearly 60 percent of the newly found insertions and 14 percent of deletions occur in fewer than one in 100 people, making them perfect markers for rare disease diagnosis.
Short read pipelines that once flagged tens of thousands of candidate changes can now shrink the list to a few hundred, speeding answers for families.
This inclusive strategy follows the draft pangenome published in 2023, which wove 47 genomes into a graph based reference. The new work extends that concept by adding depth as well as breadth, delivering a finished quality sequence where the pangenome merely outlined possibilities.
“It’s only been in the last three years that technology finally got to the point where we can sequence complete genomes,” noted Charles Lee of the Jackson Laboratory for Genomic Medicine. He considers 65 complete genomes to be a starting point, not a finish line.
“There’s more and more realization that these sequences are not junk,” added Jan Korbel, interim head of EMBL Heidelberg, referring to the repetitive DNA now decoded. Korbel highlighted that the resource is open for anyone to explore.
Both scientists see the data as a launchpad for large health care projects, from newborn screening to predictive polygenic tools, that work equally well for every community. Those applications are already being piloted by regional health systems.
The consortium is already feeding its assemblies into graph based tools so routine short read data can benefit from the richer reference. Early tests push per genome variant detection past 26,000 structural changes, roughly double earlier counts.
Sequencing costs are falling so quickly that fully phased, telomere-to-telomere genomes may soon be routine in diagnostic labs, ending the era in which physicians worked with partial maps and educated guesses. A single clinical genome that once cost millions now slips below $10,000 in some centers.
According to Beck, understanding health requires the full genetic blueprint, and this study finally hands clinicians most of the missing pages. The remaining gaps may close as long-read sequencing becomes more common in everyday medicine.
The study is published in the journal Nature.
—–
Like what you read? Subscribe to our newsletter for engaging articles, exclusive content, and the latest updates.
Check us out on EarthSnap, a free app brought to you by Eric Ralls and Earth.com.
—–