Skip to main content

Data science could help Hollywood producers generate ‘personalised’ movies after researchers established that a film’s emotional content could help to predict box-office success.

Scientists used sophisticated computer processes to analyse emotion in thousands of movies and showed that, much like novels, stories in motion pictures fit six major emotional clusters:

  • Rags to Riches: ‘An ongoing emotional rise’

The Shawshank Redemption, Groundhog Day, The Nightmare Before Christmas

  • Riches to Rags: ‘An ongoing emotional fall’

Psycho, Love story, Toy Story 3

  • Man in a Hole: ‘A fall followed by a rise’

The Godfather, The Departed, Blade Runner

  • Icarus: ‘A rise followed by a fall’

On the Waterfront, Mary Poppins, A Very Long Engagement

  • Cinderella: ‘Rise-fall-rise’

Rushmore, Babe, Spider-Man 2

  • Oedipus: ‘Fall-rise-fall’ 

All About My Mother, As Good as It Gets, The Little Mermaid

Led by Professor Ganna Pogrebna at the University of Birmingham, the team combined scripts from and complimented them with data on movies from IMDb website as well as data on revenues from

After a complex filtering procedure, the team, which included researchers from the Universities of Cambridge and the West of England, produced a final dataset of 6,147 movies with complete scripts, plus information about each movie’s gross domestic revenue in the country of first release and much more.

The research team split each script into sentences and calculated the sentimental value of every sentence from -1 (emotionally negative) to 1 (emotionally positive), before matching the sentiment to the movie’s timing and creating an emotional ‘profile’ for each film.

Their analysis revealed that highest box offices are associated with the Man in a Hole shape of movie, which results in financially successful movies (gross worldwide and gross domestic revenues) irrespective of genre and production budget.

Commenting on the project result, Professor Pogrebna said: “Movies are stories and each story told by a movie tries to trigger our emotions. Understanding viewers’ emotions using data science can change business models for the media and entertainment industry.

“Using sentiment analysis to map viewers’ preferences will allow businesses to design customer-focussed content which viewers really want to see. This may shift content decision making from production companies to customers.”

Professor Pogrebna, who is a fellow of the Alan Turing Institute, added that Man in a Hole succeeded not because it produces most ‘liked’ movies, but because it generates most ‘talked about’ movies.

“It would be over-simplification to say the motion picture industry should concentrate on producing Man in a Hole movies,” she commented. “A carefully chosen combination of production budget and genre may produce a financially successful movie with any emotional shape.

“For example, the Icarus shape is good for low-budget movies, while if you want to shoot a successful tragedy in the Riches to Rags shape, then make it epic with a large budget of over $100 million.”

She added that SciFi, Mystery, and Thrillers with happy endings in the Rags to Riches shape did not do well at the box office. Equally, it was not a good idea to shoot a comedy with a bad ending in this shape. Also, Oedipus-shaped movies on average did not seem to do well at award ceremonies and festivals, other than the Oscars.

The University of Birmingham was recently invited to join The Alan Turing Institute – a prestigious British organisation that was set up to advance the world-changing potential of data science. It was named in honour of the British pioneer whose work in theoretical and applied mathematics, engineering and computing laid the foundations for the emerging field of data science.

  • The University of Birmingham is ranked amongst the world’s top 100 institutions, its work brings people from across the world to Birmingham, including researchers and teachers and more than 6,500 international students from over 150 countries.
  • The group’s work is published by Cornell University Library in the paper ‘The Data Science of Hollywood: Using Emotional Arcs of Movies to Drive Business Model Innovation in Entertainment Industries’.
  • Led by Professor Ganna Pogrebna, the team consisted of Marco Del Vecchio (University of Cambridge), Alexander Kharlamov and Glenn Parry (both from Bristol Business School at the University of the West of England)
  • The final dataset consisted of 6,147 movies with complete scripts as well as information for each movie about:
  • gross domestic revenue in the country of first release;
  • IMDb motion picture ID number;
  • date of release;
  • average IMDb user satisfaction rating from 1 (very bad) to 10 (excellent);
  • critics satisfaction meta score from 0 (very bad) to 100 (excellent);
  • all IMDb genres of the movie;
  • rating count (number of individual assessments contributing to IMDb rating);
  • number of user reviews;
  • number of critics reviews;
  • number of awards (Oscars and other awards);
  • name of the motion picture director;
  • runtime in minutes; and
  • age appropriateness rating.

For a subset of 3,051 movies, researchers also had data on worldwide gross revenue as well as production budgets.