Test Evaluation Research Group (TERG)

hand building tower of wooden blocks with medical symbols

Effective healthcare depends on using tests that do more good than harm. The Test Evaluation Research Group (TERG) is an internationally recognised group producing and evaluating evidence on the performance of medical tests, as well as for innovation in test evaluation methods. We work across the spectrum of test evaluation, from biological variability, through diagnostic accuracy and into decision–making and patient health impact.

Theme Lead

Professor Jon DeeksProfessor Jon Deeks

Theme Lead

Professor of Biostatistics
Deputy Director

Institute of Applied Health Research

View profile


Red Coronavirus image

How good are tests to diagnose COVID19?

As part of a multi–institute international collaboration, we are undertaking a series of Cochrane ‘living reviews’ of Diagnostic Test Accuracy to answer this question for a range of tests used to diagnose current infection, or an antibody response to past infection. The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus and resulting COVID-19 pandemic present important diagnostic evaluation challenges. As part of an international collaboration, we are creating and maintaining a suite of seven living systematic reviews to cover the roles of tests and characteristics in the diagnosis of COVID-19.

Our series of Cochrane 'living reviews' of Diagnostic Test Accuracy in COVID-19

Key collaborators:

Cochrane Infectious Diseases Group (https://cidg.cochrane.org/background)

Cochrane Netherlands (https://netherlands.cochrane.org/over-ons)

Universities of Amsterdam, Utrecht, Leuven and Ottawa

WHO (https://www.who.int/ )

FIND (https://www.finddx.org/covid-19/)

Published reviews:

  1. Deeks JJ, Dinnes J, Takwoingi Y, et al. Antibody tests for identification of current and past infection with SARS-CoV-2. Cochrane Database Syst Rev2020;6:Cd013652. doi: 10.1002/14651858.Cd013652

Serology tests to detect the presence of antibodies to SARS-CoV-2 aim to identify previous SARS-CoV-2 infection, and may help to confirm the presence of current infection. We included 54 studies with 15,976 samples, of which 8,526 were from cases of SARS-CoV-2 infection, and provided data for 25 commercial laboratory-based tests or lateral flow assays and numerous in-house assays. The sensitivity of antibody tests is too low in the first week since symptom onset to have a primary role for the diagnosis of COVID-19, but they may still have a role complementing other testing in individuals presenting later, when RT-PCR tests are negative, or are not done. Antibody tests are likely to have a useful role for detecting previous SARS-CoV-2 infection if used 15 or more days after the onset of symptoms. The duration of antibody rises is currently unknown, and we found very little data beyond 35 days post-symptom onset. We are therefore uncertain about the utility of these tests for seroprevalence surveys for public health management purposes. Concerns about high risk of bias and applicability of results make it likely that the accuracy of tests when used in clinical care will be lower than reported in the included studies. 

  1. Struyf T, Deeks JJ, Dinnes J, et al. Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19 disease. Cochrane Database of Systematic Reviews 2020;6 doi: 10.1002/14651858.CD013665 

Symptoms such as fever or cough, and signs such as oxygen saturation or lung auscultation findings, are the first and most readily available diagnostic information that could be useful to either rule out COVID-19 disease, or select patients for further diagnostic testing. We identified 16 studies including 7706 participants, providing data on 27 signs and symptoms (systemic, respiratory, gastrointestinal and cardiovascular). The individual signs and symptoms appear to have very poor diagnostic properties, although this should be interpreted in the context of selection bias and heterogeneity between studies. Based on currently available data, neither absence nor presence of signs or symptoms are accurate enough to rule in or rule out disease.

  1. Dinnes  J, Deeks  JJ, Adriano  A, et al. Rapid point-of-care antigen and molecular-based tests for the diagnosis of SARS-Cov-2 infection. Cochrane Database of Systematic Reviews 2020, Issue 8. Art. No.: CD013705. doi: 10.1002/14651858.CD013705.

Point-of-care antigen and molecular tests to detect current SARS-CoV-2 infection have the potential to allow earlier detection and isolation of confirmed cases compared to laboratory-based diagnostic methods, with the aim of reducing household and community transmission. We included 18 studies with 3,198 unique samples, of which 1,775 had confirmed SARS-CoV-2 infection, and provided data for 8 commercial tests (4 antigen and 4 molecular) and one in-house antigen test. These early stage evaluations of point-of-care tests were largely based on laboratory samples. The findings have limited applicability, and we cannot be sure whether tests will perform in the same way in clinical practice, and according to symptoms of COVID-19, duration of symptoms, or in asymptomatic people. Rapid tests have the potential to be used to inform triage of RT-PCR use, allowing earlier detection of those testing positive, but the evidence currently is not strong enough to recommend their use in clinical practice.

Submitted reviews:

  1. Salameh J-P, Leeflang MMG, Hooft L, et al. Thoracic imaging tests for the diagnosis of COVID-19. Cochrane Database of Systematic Reviews. IN press

This review examined the evidence for chest imaging (computed tomography (CT), X-ray and ultrasound) in the evaluation of people suspected to have COVID-19. A total of 84 studies have been included:  71 ‘cases-only’ studies with 6331 participants diagnosed with COVID-19 at the time of recruitment, and 13 studies allowing both sensitivity and specificity to be estimated, 10 studies with 1399 participants suspected of COVID-19 and three case-control studies with 549 cases and controls.  The significant uncertainty resulting from poor study quality and heterogeneity of included studies limited ability to confidently draw conclusions based on our results. The review findings suggest that chest CT is sensitive but not specific for the diagnosis of COVID-19 in suspected patients. This low specificity could also be the result of the poor sensitivity of the reference standard (RT-PCR), as CT could potentially be more sensitive than RT-PCR in some cases. Because of limited data, limited data, accuracy estimates of chest X-ray and ultrasound of the lungs for the diagnosis of COVID-19 should be carefully interpreted.

  1. Stegeman I, Ochodo EA, Guleid F, et al. Routine laboratory testing to determine if a patient has COVID-19 pneumonia or SARS-COV-2 infection. Cochrane Database of Systematic Reviews.  

Routine laboratory markers commonly used to assess the health status of a patient are also used in patients with COVID-19 infection and may be useful for triage of people with potential COVID-19 infection for the necessity of treatment or more intensive treatment, especially in situations where time and resources are limited. A total of 21 studies were included in this review, reporting data for 67 different laboratory tests in 70,711 patients, of whom 14,126 had COVID-19. There was considerable heterogeneity in tests, cut offs and settings. The accuracy of 16 tests were summarised using meta-analysis, of which only three performed at sensitivity-specificity combinations where both sensitivity and specificity were above 50%. There was low to very low certainty in the summary estimates of the tests. Evidence to date suggests that in sick hospitalised patients, routine tests cannot discriminate between COVID-19 and other diseases as the cause of infection, inflammation or tissue damage and should preferably not be used as standalone tests for COVID-19.

Reviews in preparation:

  1. Cochrane COVID–19 Diagnostic Test Accuracy Group. The effect of sample site and collection procedure on identification of SARS-CoV-2 infection using laboratory based molecular tests. Cochrane Database of Systematic Reviews in preparation

This review will include within-study (direct) comparisons of the diagnostic yield of laboratory-based molecular tests according to sample site (including saliva, upper or lower respiratory tract samples, faeces or urine) or collection procedure (including swab versus wash, different storage media, and self sampling compared to health care provider sampling) in the same patients.

  1. Cochrane COVID–19 Diagnostic Test Accuracy Group. PCR compared to alternative laboratory-based molecular tests for identification of SARS-CoV-2 infection. Cochrane Database of Systematic Reviews in preparation

This review will include within-study (direct) comparisons of the diagnostic yield of different  laboratory-based molecular tests, including those using alternative methods for extraction of RNA using the same test), and comparisons between RT-PCR and innovative NAA technologies such as RT-LAMP or CRISPr.

Other Covid-19 related output:

  1. Taylor-Phillips S, Berhane S, Sitch AJ et al. Information given by websites selling home self-sampling COVID-19 tests: An analysis of accuracy and completeness. This review of websites selling COVID-19 tests in the UK and US reports on accuracy and completeness of communication prior to purchase. Three key topics for communication are covered: who should take the test ad when; test accuracy; and interpreting test results. medRxiv 2020.08.18.20177360; doi: https://doi.org/10.1101/2020.08.18.20177360
  2. Raffle AE and Taylor–Phillips S. Test, Test, Test: Lessons learned from experience with mass screening programmes. A report written for Independent Sage, summarising implications for best practice when evaluating tests in public settings. Advice note for Independent Sage 5 June 2020:
  3. Deeks JJ, Brookes AJ, Pollock AM. Operation Moonshot proposals are scientifically unsound BMJ 2020; 370 :m3699.
  4. Watson J, Richter A, Deeks JJ. Testing for SARS-CoV-2 antibodies BMJ 2020; 370 :m3325. 

Meet the team

About the research

TERG undertakes both methods research (developing and evaluating methods for designing, analysing and reporting studies) and applied health research (applying the best methods to health research questions collaborating with scientific and clinical colleagues).  We are involved in many (and lead several) international collaborations working to set the standards for the design, delivery and analysis of primary studies for tests.

  1. Primary studies of diagnostic test accuracy

Comprising statisticians, clinicians and systematic reviewers, we have a broad portfolio of expertise in planning, delivery and analysis of primary studies designed to assess the use of tests in healthcare.

 2. Systematic reviews of diagnostic test accuracy.

TERG is also recognised for its work within the field of systematic reviews and meta-analysis of test evaluation, including in particular: comparative test accuracy, tailored meta-analysis, multiple thresholds and investigation of heterogeneity.

 The Test Evaluation Research Group has close links with the Cochrane Screening and Diagnostic Tests Methods Group (SDTMG) and provides the editorial base for systematic reviews of diagnostic test accuracy, published in The Cochrane Library. This service includes organisation of the peer review and editorial approvals for the methods content of Cochrane DTA (Diagnostic Test Accuracy) protocols and reviews.

 3. Estimating the technical properties of biomarkers

 4. Ensuring test evaluations are fit-for-purpose. We are also active in other areas of methodological innovation, particularly in developing theory and practical applications for assessing how tests impact on decision–making and patient health (clinical effectiveness).

 5. Prognosis and prediction rules

 6. Screening

 7. Monitoring

Current projects

  • We are running a number of projects within the Diagnostics and Biomarkers cross-cutting theme at the NIHR Birmingham Biomedical Research Centre 
  • The Diagnostics and Biomarkers theme works with the three key research themes in the NIHR Biomedical Research Centre to develop, design and deliver portfolios of test research studies and to advance the methodology behind early evaluative studies.
  • The over-arching objective is to ensure that tests and biomarkers developed or used in the research themes undergo appropriate evaluation and assessment before being utilised as clinical tests or outcome measures, as well as improving the methodological basis upon which such assessments are made.

BRC Research Projects


  • Systematic review of accuracy of imaging tests for diagnosis of Rheumatoid Arthritis (RA)  in patients with early symptoms
  • Systematic review of accuracy of autoantibody tests and prediction rules for diagnosis of RA in patients with early symptoms
  • Systematic review of the measurement properties of grip strength in different disease groups
  • Primary study of  measurement properties of RA tissue biomarkers to assess their value as tests and outcome measures in early-phase trials, particularly looking at flow cytometric measurement of fibroblast groups
  • Estimating measurement properties of scoring systems based on glands identified from lip biopsies for Sjögren's syndrome
  • To establish the validity of functional markers in sarcopenia patients, we have designed a study to assess biological variability within the existing cohort study
  • Developing biomarker combination signatures and evaluation of their clinical accuracy in Gastroenterology/Liver disease


  • Modelling optimal use of tests for monitoring disease progression and recurrence. Monitoring to identify disease recurrence or progression is common, often with limited evidence to support the tests used, subsequent decisions, frequency and duration of monitoring. We aim to develop methods for designing evidence-based monitoring strategies and estimating measurement error, a key consideration in selecting monitoring tests.
  • Models to assess the biological variability of count outcomes and methods for combining estimates of variability across studies. We are using statistical techniques to evaluate the ability of methods to estimate variability parameters when differing numbers of glands are obtained; we are also looking to investigate the impact of variability between glands and patients to allow sample sizes to be appropriately calculated for such studies. In addition, we are reviewing the methods for undertaking systematic reviews of biological variability studies.
  • We are looking at the potential for using routine data to provide information about test performance.  We have identified a statistical method known as the variogram which may be able to estimate measurement error from routine monitoring data. We have developed links with UHB and are looking at opportunities to evaluate the impact of introduction of a new test or monitoring pathway from routine data.

Applied Health Research (Primary and Secondary)

 Primary Current

Secondary Current

  • ROCkeTS (Refining Ovarian Cancer Test Accuracy Scores)A series of 3 test accuracy systematic reviews of the accuracy of symptom combinations, biomarkers and test combinations for detecting ovarian cancer in pre and post-menopausal women
  • CATCH-ME (Characterising Atrial fibrillation by Translating its Causes into Health Modifiers in the Elderly)

Primary Completed since 2018

Secondary Completed since 2018

Methodology Research:


  • CONSORT–AI Extension 
  • Evaluation of diagnostic imaging test performance: Including interobserver variability and time to diagnosis. NIHR doctoral fellowship started April 2020. The project involves methodological research through systematic reviews, case and simulation studies. If interested please get in touch with Laura Quinn
  • Methods to Evaluate Screening 
  • Opportunities and Challenges in Using Routine Data Sources to Evaluate Biomarker
  • SPIRIT–AI Extension  
  • TEST (Test Evaluation Using Structured Tools). UK MRC funded project, launched 1st March 2020. If you evaluate diagnostic tests (by study, review or policy discussion) and are interested in contributing to the design of a new tool to decide which studies to use to most efficiently evaluate a diagnostic, get in touch at ferrantl@bham.ac.uk
  • “Ensuring test evaluation research is applicable in practice: investigating the effects of routine data on the validity of test accuracy meta–analyses”, an MRC Clinician Scientist fellowship being undertaken by Dr Brian Willis.

Completed since 2018

Conferences, Workshops and Seminars

The group organises or is affiliated with a number of research events, all of which promote understanding and knowledge of test evaluation, and/or medical statistics.

We run two CPD courses:

1. Systematic reviews and meta-analyses of diagnostic test accuracy 2018: We provide a 3–day CPD course in partnership with the University of Amsterdam, on how to conduct systematic reviews and meta-analyses of diagnostic test accuracy. The course has run annually since 2014,  however it will not run in 2020.                             

2. Evaluating medical tests (EMT): How do we tell if this biomarker or diagnostic test is any good? In 2019 we launched a new 3–day course to provide training in how to conduct primary studies that evaluate biomarker and diagnostic evaluations. Aimed at research teams who are developing these tests, the course covers the design, analysis and interpretation of primary studies, as well as an awareness of the portfolio of studies test developers are likely to need to undertake at difference stages of the translation pathway. The course first ran in May 2019. Due to COVID19 the next course is currently projected to run in the first half of 2021.

For CPD enquiries, please contact: Natasha Maguire

Regular meetings and seminars:

  • TERG Sessions: Every month our research group holds a business meeting, which serves a dual purpose of providing methodological support to teams evaluating medical tests. Every meeting reserves a one–hour slot for the team to present us their project and quandary (20 mins) and provide round–table discussion on methodological solutions.

For enquiries, please contact the meeting coordinator:  Lavinia Ferrante di Ruffano 

  • Biomarker Club:  In association with the NIHR BRC Birmingham, TERG launched a Biomarker Club for University of Birmingham researchers who are actively involved in developing and/or evaluating medical tests. These meetings aim to provide an opportunity to learn test evaluation methods, share research challenges, and network. Due to COVID19 the next meeting has been pencilled in for early 2021. For enquiries please contact TERG@contacts.bham.ac.uk


  • MEMTAB: In 2008 we launched the world’s first symposium to be focussed on methods for evaluating medical tests. This international symposium attracts researchers, healthcare workers, policy makers and manufacturers who are actively involved in the development, evaluation or regulation of tests, (bio)markers, models, tools, apps, devices or any other modality used for the purpose of diagnosis, prognosis, risk stratification or (disease or therapy) monitoring. Located at the University of Birmingham in 2008, 2010 and 2013, the symposium is now called ‘Methods for Evaluating Medical prediction Models, Tests and Biomarkers (MEMTAB)’, and is hosted in turn by several world–leading centres for medical test evaluation. MEMTAB2020 is taking place at the University of Leuven's EPI-Centre, 2–3rd July 2020.

Online resources: Free materials for conducting reviews and meta–analyses of diagnostic test accuracy are available through Cochrane’s Screening and Diagnostic Test Methods group as training materials and the DTA handbook

Key publications

Freeman K, Dinnes J, Chuchu N, Takwoingi Y, Bayliss SE, Matin RN, Jain A, Walter FM, Williams HC, Deeks JJ. Algorithm-based smartphone ‘apps’ for assessment of the risk of skin cancer in adults: a systematic review of diagnostic accuracy studies. BMJ 2020; 368 https://doi.org/10.1136/bmj.m127

Wolff RF, Moons KG, Riley RD, Whiting PF, Westwood M, Collins GS, Reitsma JB, Kleijnen J, Mallett S: PROBAST: A tool to assess the risk of bias and applicability of prediction model studies. Annals of Internal Medicine 2019, 170(1):51-58.

Takwoingi Y, Whitworth H, Rees-Roberts M, Badhan A, Partlett C, Green N, Boakye A, Lambie H, Marongiu L, Jit M, White P, Deeks J, Kon O, Lalvani A on behalf of the IGRAs for Diagnostic Evaluation of Active TB (IDEA) Study Group. Interferon gamma release assays for Diagnostic Evaluation of Active tuberculosis (IDEA): test accuracy study and economic evaluation. Health Technol Assess 2019;23(23).

Kasivisvanathan V, Rannikko AS, Borghi M, Panebianco V, Mynderse LA,… Deeks J, Takwoingi Y, Emberton M, Moore CM; PRECISION Study Group Collaborators. MRI-Targeted or Standard Biopsy for Prostate-Cancer Diagnosis. N Engl J Med. 2018 May 10;378(19):1767-1777. doi: 10.1056/NEJMoa1801993

Whiting P, Leeflang M, de Salis I, Mustafa RA, Santesso N, Gopalakrishna G, Cooney G, Jesper E, Thomas J, Davenport C. How to write a plain language summary for a diagnostic test accuracy review.  J Clin Epidemiol 103 (2018) 112-119.

Takwoingi Y, Leeflang MM, Deeks JJ. Empirical evidence of the importance of comparative studies of diagnostic test accuracy. Ann Intern Med. 2013;158(7):544-54.

Ferrante di Ruffano L, Hyde CJ, McCaffery KJ, Bossuyt PM, Deeks JJ. Assessing the value of diagnostic tests: a framework for designing and evaluating trials. 2012. BMJ. http://dx.doi.org/10.1136/bmj.e686 


Please send an email to TERG@contacts.bham.ac.uk for all enquiries, unless indicated in the relevant section above.