Over the past decade, genetic sequencing tools have been very successful at identifying variations in the human genome that are associated with traits such as height or complex conditions like Alzheimer’s disease.

Despite these successes, 90 percent of genetic variants found to be linked to disease are located in noncoding DNA, or parts of the genome that don’t encode proteins involved in biological processes. Instead, these noncoding regions likely carry out utilities that regulate the expression of functional genes. These variants are often located in sections of DNA containing several genes, so it’s difficult for scientists to pin down the exact gene (or combination of genes) that are responsible for a certain association.

Researchers have turned to additional statistical tools to try to narrow this down, looking into levels of gene expression and the transcription of RNA. In 2016, a team from University of Chicago highlighted the key role a process called RNA splicing plays in genetic variation and risk for disease. Nearly all genes undergo RNA splicing, where pieces of RNA are cut out and stitched back together to create different versions of mature mRNA transcripts. This significantly increases the number of proteins a single gene can generate and is thought to explain much of the complexity in higher-order organisms, but at least 15 percent of all human diseases are thought to be due to splicing errors.

Yang Li, PhD, an assistant professor of medicine and human genetics at UChicago who led the 2016 study, created a software tool called LeafCutter that can identify genetic variants that affect splicing. Li says he built LeafCutter to be fast and efficient to handle the vast amounts of data needed to analyze hundreds of genomes. Essentially, the tool allows researchers to drill into genetic variants linked to a disease and identify the specific genes involved.