Person points at laptop

An international initiative called STANDING Together has released new standards ensuring that medical artificial intelligence (AI) systems are developed with appropriate health datasets. These recommendations are the result of a two-year research study involving over 350 people from 58 countries, including patients, researchers, healthcare professionals, industry experts, and regulators.

The STANDING Together recommendations ensure that the full diversity of people, that AI systems will be used for, is represented in health datasets. This is imperative as AI systems are less likely to work well for people who aren’t properly represented in datasets – and may even be harmful. People who are in minority groups are particularly likely to be under-represented in datasets.

The recommendations provide guidance on collecting and reporting details such as age, sex, gender, race, ethnicity, and other important characteristics. They also recommend that any limitations of the dataset should be transparently reported to ensure that developers creating AI systems can choose the best data for their purpose. Guidance is also given on how to identify those who may be harmed when medical AI systems are used, allowing this risk to be reduced.

STANDING Together is led by researchers at University Hospitals Birmingham NHS Foundation Trust, and the University of Birmingham. The research has been conducted with collaborators from over 30 institutions worldwide, including universities, the UK medicines regulator (the Medicines and Healthcare products Regulatory Agency, MHRA), patient groups and charities, and small and large health technology companies. The work has been funded by The Health Foundation and the NHS AI Lab and supported by the National Institute for Health and Care Research (NIHR).

AI models are underpinned by data, which captures a wealth of information. When dealing with health data, this information can unfortunately include existing health inequalities. These inequalities can come about in many ways, including underrepresentation of particular groups, or as a reflection of structural biases within wider society. It is vital that anyone using data to develop new innovations (including AI) are aware of any biases, and that they are accounted for. As we move towards an AI-enabled future, we can ensure these technologies don’t just work on average, but that they work for all.

Dr Xiaoxuan Liu, Lead researcher.

Dominic Cushnan, Director AI, Imaging & Deployment at the NHS AI Lab, said: “The lack of diversity and inclusivity in our current datasets are major challenges in our ability to ensure AI in health and care works for everyone. These standards are an important step towards transparent and common documentation of represented groups in our data, which can support the responsible and fair development and use of AI.”

The recommendations are available open access at www.datadiversity.org/recommendations to support the development of safe, effective and equitable AI tools for healthcare.