Research themes

Although encompassing a vast range of knowledge and experience, the expertise found in CCB can broadly be divided into the five areas outlined below. 

Next Generation Data Sciences

The size of biological databases has seen an exponential increase over recent years. This necessitates an efficient application and translation of Big Data concepts and techniques into bioinformatics and genomic realms, so as to handle, store and process the data quickly and accurately. At CCB, we use a broad range of tools and techniques to implement big data analytics, resulting in end-to-end executions of data science projects. The projects involve curating, managing and storing relational and nonrelational databases. The analytical pipeline comprises steps, where the data is fed into a wide range of state-of-the-art and novel predictive algorithms. Finally, results are visualized and communicated to a broad audience through publications, presentations and effective visualization means. To develop the pipelines, the Centre heavily relies on the prime facilities of the University of Birmingham. This includes an in-house supercomputer, known as BEAR and the CaStLeS (Compute and Storage for Life Sciences), for high performance computing (HPC), high throughput computing (HTC) and storage of large-scale research data. Moreover, researchers have access to the largest IBM POWER9 AI cluster in the UK, for highly parallelized and computationally intensive machine learning cloud computations. Altogether, the research and facility enable the development of innovative approaches in health data science and the delivery of high-quality data-driven research.

Health Data Sciences

The explosion in health technologies has rendered our ability to acquire health data to be far outstripped by our ability to assimilate and interpret them. The challenge the biomedical community now faces is to develop frameworks that cater the holistic integration and interpretation of these dataset across scales, modalities and levels of granularity.

We are developing and applying novel health data driven methods stemmed from a diverse scientific domains ranging from biomedical and health sciences to data science, bioinformatics, clinical informatics, computer science, statistics and engineering to address this challenge. Our approaches enable the harmonisation, interoperability and integration of health data, from the molecular level to the whole patient level, across different modalities and levels of granularity to gain a better understanding of pathophysiology and pathobiology of human disease and cater the development of novel diagnostics and precision medicine applications.

Precision Medicine

No two human beings are alike: we differ in our genetics, our reaction to disease, and our response to treatment. Precision Medicine focuses on the customization and tailoring of healthcare and medical treatment to the individual characteristics of each patient. To achieve this goal requires research that can capture how individuals and subpopulations differ in their susceptibility to disease, what differences can be observed in the prognosis of disease, and the underlying genetic and biological behaviours that characterize different responses to treatment. The field, also called personalized medicine, concentrates on combining omics technologies, big data analytics, and population health to assess impacts on individuals. This remains an ever-growing area of study that is expanding with the development of other fields and technologies, including biomarker identification and pharamacogenomics, as well as continuous development of omic technologies.

Integrated Omics

The advent of high-throughput omics technologies poses new challenges in analysing, processing, and merging different sources of information into biological and clinically meaningful contexts. Integrated omics focuses on the computational and informatics frameworks that facilitate the fusion of major omics data types used in everyday biology (including genomics, transcriptomics, proteomics, metabolomics, and phenomics). The term is used interchangeably with multi-omics, pan-omic, and trans-omic approaches. The overarching goal of Integrated omics is to generate a systems biology overview of the biological question, which would provide unprecedented and transformative insights not observed by singular analysis of individual omics platforms. This is an exciting but immense challenge for computational biologists and bioinformaticians. Machine-learning approaches for analysis of integrated omics datasets, and methods than can facilitate cross-talk of multi-omic layers, are used to process and understand the complexity of heterogenous datasets. Part of the strategy includes developing methods that facilitate integrating data of different dimensions and biological contexts, methods data normalization, and development of storage platforms that can handle up to peta-byte sized data files. Ultimately, these efforts provide more comprehensive views of human health, disease, and basic biology.

Environmental Life Sciences

Environmental degradation and pollution are posing grave risks to the world. Legacy and emerging contaminants affect air, soil and water in every part of the globe, disrupting vital ecosystem services that people depend on and profoundly affecting human health. Our multidisciplinary research combines life science, environmental science and data science, aiming to discover the environment conditional dependency of a hazardous outcome to reduce or eliminate exposure of whole communities to unknown or unidentified harmful substances.We use high throughput DNA sequencing and mass spectrometry technologies to produce multi-omics data for sentinel and surrogate model species. Empowered by high-performance computing, we apply cutting edge computational biology approaches including interpretable machine learning, deep learning, and network modelling to understand the functional pathways relevant to environmental perturbation and corresponding phenotypic outcome. We characterise molecular "fingerprint" of hazard responses that are shared among species including human by evolutionary descent.We establish causal links of human impacts in ecosystem resilience and human health on the individual, population and biosystem levels, and thereby predict the adverse effect of new contaminants before they are introduced in the environment. Our research helps to improve the knowledge of systematic biological response to ecosystem disturbance and provide science-­based interventions to safeguard and sustain the environment.