From climate change to public health, governments are increasingly forced to make complex decisions in conditions of uncertainty. Good decision-making requires high quality data, often near real-time, but too often, this evidence is either locked away in academic journals or in formats that do not speak to their needs.

The data toolbox for policy-focused analytics has improved dramatically in recent years, thanks to novel and alternative data sets like Reddit or Twitter as well as the machine learning models and tools to capture, structure and make sense of that data. But this is beyond the reach and resources of most public administrations. Data scientists, driven by a desire to see academic research have real-world impact, are helping decision makers to harness insights from these valuable unstructured troves. 

Dr Anandadeep Mandal, an applied mathematician at the University of Birmingham, believes academic researchers can use state of the art data tools to support policy makers. “Research [often] does not translate into policy implications but I am keen to help society with information that can support the Sustainable Development Goals,” says Dr Mandal. His work draws on data from diverse sources, including social media, surveys, and official databases like those of the UK Office of National Statistics (ONS), to provide policy makers with actionable intelligence, with a strong focus on sentiment analysis.

Dr Mandal is working on projects spanning fields including hate speech, healthcare and climate change. In the latter case, he has utilised data science and modelling tools to identify, process and classify climate-related posts on platforms like Reddit, at a scale that would be impossible manually.

In one recently-published project published in Nature Scientific Reports, Dr Mandal, working with Akshay Kaushal at the HSBC Global Research, HSBC Global Banking and Markets, Bangalore, India and Animesh Acharjee at the Institute of Cancer and Genomic Sciences at the University of Birmingham, mined 1.7 million posts from 55 climate-related discussions threads or ‘subreddits’ on the social media platform Reddit from Jan 2008 to June 2021.

Noting the emergence of social media over this period as a ‘public address system’, the team realised it was an ideal source of information for understanding the public debate on climate change in real time. This can show both the overall levels of public discussion on specific themes and topics, their change over time, and the effectiveness of scientific and government communications about the causes and consequences of climate change.

By employing USE, a state-of-the-art sentence encoder, and a clustering algorithm, the team developed a machine learning-based approach to identify, store, process and classify posts on Reddit automatically at scale. The team chose Reddit because the platform is structured into theme-based communities with defined rules governing the creating or sharing of posts creating rich, specific, and relevant content and the longer time frames the site enables data gathering exercises to cover.  In the broad and multifaceted theme of climate change, their approach narrowed down the focus to ten critical underlying themes comprising the public discussions on social media over time.

The researchers observed spikes in language clusters in response to environmental events, policy announcements, influential reports or studies, and climate agreements. It also revealed patterns of public support or opposition for the environmental narratives of political leaders and tracked how granular issues, like plastic waste, gain prominence over time. The analysis also shows that the climate science community has been more successful than public administrations when it comes to communicating the causes and effects of climate change.

The project revealed overall sentiment patterns related to different government administrations, with more public discontent related to Donald Trump’s handling of environmental policies compared to the administration of Barack Obama. This analysis revealed how a specific environmental issue can suddenly attain prominence from a position of marginal relevance, as happened with plastic waste which suddenly became a prominent topic of discourse around 2018 having been a marginal theme in climate discussions previously.

Understanding how climate change is being discussed in public fora can help policy makers bridge the gap between policy interventions and social implications like carbon taxes while lack of trust in government is a critical variable to address when shaping public support for climate change interventions like carbon taxes. In the future, more advanced techniques and algorithms could emerge to improve the quality of such analysis and provide tools like machine learning models that track the evolution of social media discourse over time related to specific themes. Dr Mandal believes academic research needs to solve a problem for governments. “What matters for policy is the data and the implications,” he argues.

Understanding vaccine hesitancy

Understanding sentiment and public opinion is critical for governments as they seek to evaluate the effectiveness of their communications, and the causes for public opposition to policies they wish to advance. 

Mandal has applied a novel data-driven approach as part of a team involved in the critical issue of vaccine hesitancy, focusing on Black, Asian and minority ethnic (BAME) communities. Working with researchers from the Universities of Wolverhampton, with the support of the Clinical Research Network West Midlands (CRN WM) and The Royal Wolverhampton NHS Trust (RWT), the UPTAKE study was a national anonymous cross-sectional online survey, circulated across social media, radio and healthcare resources, that explored vaccine attitudes among BAME communities. It culminated in insights provided to the UK Vaccine Task Force.

The project revealed that some BAME demographics, in particular younger people, were more likely than average to be vaccine-hesitant and were highly susceptible to social media and online misinformation. There are historical roots to such scepticism, with free text comments indicating a fear among BAME communities of being used as ‘guinea pigs’ for trials to verify vaccine results, and a mistrust around government strategies. It showed the need for governments to understand attitudes and sentiments to design more effective public communications efforts.

The survey also revealed geographic nuances, such as that participants in smaller cities like Leicester and Aberdeen were more likely than those in larger metropolitan ‘core’ cities to want to participate in vaccine trials, possibly due to larger pockets of inner-city poverty in the latter. This again can help inform public health strategy. With only half of vaccine clinical trials in the UK achieving their recruitment target, resulting in approximately one third of trial terminations, uptake insights are key to improving both trial participation and uptake of the vaccine itself. Understanding the demographics of those who are less likely to partake in trials will help to target strategies in recruiting patients to trials.

Mandal describes policy makers as very receptive to utilising data-driven evidence of this kind, provided researchers understand the needs of their audience. “The key is to give them what they want, and not what you want. There’s no good saying you are using an excellent data model. You need to give them what they want for decisions relating to society”.

Mandal also emphasises the need for multidisciplinary approaches and constant interaction with data users. “I am a data scientist, but in the COVID study for instance, there was a team of doctors, clinicians, and policy makers feeding me what they want. There was this constant interaction between what I am doing and what they are receiving. One person cannot take over the whole study. It must be a team effort.”


Discover more stories about our work and insights from our leading researchers.