Complex regions of the human genome remained uncharted, even after researchers sequenced the genome in its entirety. That is, until today.
Researchers decoded DNA segments involved in the development of diseases like diabetes and spinal muscular atrophy that had previously been considered too complicated to sequence. Their work, published in Nature on Wednesday, could expand the future of precision medicine.
“This is a landmark paper,” said Barbara Mellone, professor of molecular and cell biology at the University of Connecticut, who was not involved in the research. “It opens the door to potentially solving cases that have been inaccessible to diagnosis for a long time.”
The first truly complete human genome was sequenced in 2022. A year later, scientists unveiled the first human “pangenome,” an effort to represent the genetic variability of populations worldwide. But still, gaps remained — gaps that Wednesday’s study has helped fill. The research solved for 92% of missing data in the human genome. And it mapped genomic variation across ancestries to a degree not reached before.
The international team of researchers co-led by the Jackson Laboratory used data from 65 human samples, which spanned five continental groups and 28 population groups. They started by sequencing the data using a combination of two technologies. The first, Oxford Nanopore Technologies’ ultra-long sequencing tools, allowed the researchers to scaffold regions that are difficult to sequence due to their density. The second, Pacific Biosciences’ high-fidelity sequencing tool, allowed the researchers to achieve high base-level accuracy when sequencing.
Christine Beck, a senior study author and geneticist at the University of Connecticut Health Center, said that this “one-two hit” is what allowed her team to overcome previous technological hurdles and surmount the missing genome regions.
The researchers then partitioned the individual sequences into haplotypes, groups of genes that are typically inherited together from a single parent. These were subsequently compiled into contiguous stretches to form haplotype-resolved assemblies, which separate and individually represent the haplotypes inherited from each parent. In the final step, researchers compared each haplotype to that of a reference genome to identify the structural variants that could lead to diseases, as well as understand the degree of genetic variation across different populations, Beck said.
The group fully sequenced several of the most complex regions that have previously been associated with genetic diseases. One such region is the major histocompatibility complex, which encodes the machinery for antigen presentation, a crucial process in the body’s immune response. This part of the genome has been linked to conditions like cancer and type 2 diabetes, as well as differences among individuals in their viral susceptibility, according to Beck.
The study resolved the sequences for the SMN1 and SMN2 genes, which are associated with spinal muscular atrophy and have previously been the target for therapies for the disease. The amylase gene cluster, which aids in the digestion of starchy foods, was also decoded.
And the researchers sequenced over 1,200 centromeres, which are specialized regions of the chromosome that are essential to cell division. They found that the alpha satellite array, which forms the foundation of human centromeres, can vary up to 30-fold in length. Centromere variation can cause chromosomal abnormalities like trisomies, when an individual has three copies of a chromosome — leading to conditions like Down syndrome, Edwards syndrome, and Patau syndrome, Beck said.
Discerning the sequences and their population variation, Beck added, is a step toward understanding the development of associated diseases. This has significant implications for precision medicine, according to Charleston Chiang, a medical population geneticist at Keck School of Medicine of USC, who was not involved in the paper.
“It’s ultimately rooted in being able to more clearly define a person’s risk,” Chiang said.
The vast majority of studies related to genetic disease diagnosis have focused on single nucleotide polymorphisms, a gene variation that occurs when one base pair is changed, according to Chiang. This means that risk assessment for genetic disorders has largely ignored structural variants across different population groups. But the new study, which lays the foundation for understanding these variants’ associations with diseases, could ultimately enable physicians to deliver much more tailored genetic diagnoses — and in turn, treatments.
The diversity of the study’s sampled individuals is also key to its significance, according to Mellone. The research revealed that African ancestry samples had the most structural variance, which supports the idea that this population harbors the deepest reservoir of human genetic diversity. Considering this finding is essential when thinking about reference genomes, which have traditionally been biased towards European ancestry, Mellone added.
Though the paper has a more diverse sample compared to previous studies, a limitation is its sample size, according to Chiang. An analysis of many more global populations is necessary to fully represent the human genetic world and possible structural variations that could lead to diseases, he added.
Still, Chiang said the paper has important implications.
“It’s clearly the direction that our field, in terms of generating genetic variation data, is moving towards,” Chiang added. “The idea has been talked about for a while, and you’re seeing them, one by one, becoming realized.”