Algorithm matches genetic variation to disease symptoms and could improve diagnosis of rare diseases

A faster and more accurate method of identifying which of an individual’s genes are associated with particular symptoms has been developed by an international team of researchers from the University of Birmingham, University of Cambridge and Saudi Arabia.   

DNA strand with binary figures on it

Around 80% of rare diseases are thought to have a genetic component, but currently many patients experience long delays in diagnosis or never receive a diagnosis at all. Recent developments in our ability to obtain whole or partial genome sequences cheaply and efficiently now makes it feasible for patients to benefit from this technology through cheaper, faster diagnosis of disease, and the development of new therapies. 

Many international projects are currently seeking to capitalise on this by sequencing hundreds of thousands of individuals, such as the UK's 100,000 Genome Project. However, a major challenge remains how to associate changes in a patient’s DNA to their disease.

'The challenge for scientists is to identify which of the hundreds of thousands of genetic differences between a patient and an unaffected individual might be responsible for their disease,” says Dr Paul Schofield from the Department of Physiology, Development and Neuroscience at the University of Cambridge. “Given the huge complexity of this problem, it has been described as "looking for needles in stacks of needles".'

Now, Dr Schofield and a team of researchers from the UK and Saudi Arabia have developed an algorithm, published in the journal PLOS Computational Biology, that can identify variants that modify the normal function of a gene associated with a particular disease.

A framework developed by the team, called PhenomeNET, matches a patient’s phenotype (symptoms) to a large database of gene-to-phenotype associations, including those from studies involving mice and zebrafish, in order to identify disease-causing genes.

Creating the algorithm

Mice and zebrafish are commonly used when studying the biology underlying human diseases as they have a number of important genetic and biological similarities to us. For many years, data on the consequences of naturally-occurring and experimentally-induced genetic variants in these animal models have been collected resulting in a huge ‘Big Data’ resource associating genetic makeup and phenotype, such as the Mouse Genome Database, which contains more than 60, 000 of these associations.

By combining PhenomeNET with methods that find harmful variants in a genomic sequence, the team developed the PhenomeNET Variant Predictor (PVP) system, an algorithm that prioritises these variants with their likelihood of involvement in human disease.

'We’ve shown that our algorithm works for simpler diseases and now the real test will be to determine whether a similar approach can be applied to complex diseases, such as diabetes, where multiple genes are involved,' says Professor George Gkoutos from the University of Birmingham.

Working with Dr Nadia Schoenmakers at the Wellcome Trust-MRC Institute of Metabolic Science in Cambridge, the team was able to show that the new algorithm can identify genetic changes in patients with congenital thyroid disease, and can reveal candidate genetic changes in ‘Mendelian’ diseases where only a single gene is involved. 

'Our algorithm makes use of clinical and experimental data that have been collected for years and uses them to identify the genetic variants underlying the conditions of patients with genetic disorders,' adds Professor Robert Hoehndorf from King Abdullah University of Science and Technology (KAUST) in Saudi Arabia.

Find out more