Why the 'poll of polls' cannot be an exact science: the problem with predictions

There is only one perfect poll for the EU referendum, and that is the one taking place across the country today. In the lead up to this, opinion polls have been highly contradictory, with big swings seemingly taking place on consecutive days.

Why are all of the polls so different? One explanation is that, in the wake of the 2015 General Election, a disaster for pollsters, the industry decided something must be done and changed their models in various ways. There are two problems with accurately predicting the result of a referendum, or indeed public opinion on any question, one is mathematical and the other is practical.

Take a thousand people, ask them a question, and then extrapolate that to all people. If you do this once, you are very likely to be right, but the more often you have surveys, the more likely it is that one of them is wrong, because by chance you happen to have chosen a thousand people who are not representative of the population at large. For example, reports and studies often use a 5 per cent significance level, this means one in 20 of them will report that something is true when in fact it’s false. This mathematical problem can be reduced by just doing even more polls, and then taking the average, the ‘Poll of Polls’.

But this ignores the second, practical problem: that of choosing a thousand people at random from the population. This is more-or-less impossible, so statistical models are used to try to account for the fact that your sample isn’t representative. The most powerful of these is called multi-level regression and poststratification. In broad terms, the idea is to decide which factors affect voting intentions, such as age, sex, political affiliation, education level, newspaper readership and location. It then skews the responses to fit the profile of the average voter.

In a basic version, suppose we survey a thousand people and we get responses from 531 women and 469 men. This is close to, but not exactly, the ratio of males to females in the UK. So the women’s responses are multiplied by 0.958 and the males by 1.047 to make them match the UK population, which is about 50.9 per cent female. If voting intentions do depend on whether the person is female or male then this should correct it, and if not then we haven’t made things worse.

Of course doing this with many parameters, the corrections become hideously complicated. If the factors that polling companies correct for are not the most important, and hitherto unconsidered variables are more important, this weighting procedure won’t actually work. Particularly with younger voters, newspaper readership and political party affiliations might well not be very good indicators, especially on this issue, and a new model might be needed for them.

Another big problem with these models is that they bolt on a ‘turnout model’, which aims to predict which sections of the public will vote. Some companies base it on self-selection, so if I tell them I’m going to vote then they believe me. Others think that is notoriously inaccurate, which it is, and so base their models on the 2015 General Election turnout. However, this referendum is not an election, and it might well be that both models are horrifically inaccurate.

The ‘undecideds’, as in the 2015 General Election, are the key demographic, and in past sovereignty votes have leant more towards the status quo. Undecideds are often ignored in the headline figures, and including them at say a 60–40 split for Remain, changes the balance significantly.

Finally, the people you recruit for online polling, or contact by random digit-dialing for telephone polling, might not be representative of their section of society. Every method of selecting people has its flaws. Online polls like YouGov’s have consistently shown a far closer race than telephone polls like ComRes’s. At least one of these is very wrong. The Poll of Polls approach intends to smooth out these differences, and hopes that everybody is wrong but in different ways, so the errors cancel out.

One thing that won’t need to be considered in these opinion polls is location; this is a national poll, where everybody’s vote is equal and there are no safe seats and marginal constituencies. In this vote, even more so than in a general election, every vote counts. It is imperative that each person makes their voice heard. This vote is in some sense even more important than a general election, as it will shape the direction of the country’s path for decades to come.

Dr David Craven

Senior Birmingham Fellow

School of Mathematics

The views and opinions expressed in this article are those of the author and do not necessarily reflect the official policy or position of the University of Birmingham.

Featured staff

Dr David Craven

Senior Birmingham Fellow

Featured staff

Dr David Craven

Senior Birmingham Fellow