Perception, Language, Action group - University of Birmingham

The research theme “Perception, Language, Action” brings together researchers in the research specialisation areas of Computer Vision, Imaging, Natural Language Processing, Embedded Robotics, and Cognitive Science.

While the individual research areas focus on specific theoretical questions and related applications, the areas of Computer Vision and Imaging Science have always been closely related by sharing techniques, methodology and tools related to capturing data (in the form of images, videos, 3D, multimodal and multispectral data), modelling and analysing visual information.

The aim of Natural Language Processing (NLP) is to develop computational models for analysing and generating human language. While NLP, Computer Vision and Imaging Sciences have traditionally been separate research fields with quite specific theoretical and methodological underpinnings, in the era of machine learning, big data and large language/vision models, parts of the methodology and tools have started to converge. This also coincides with synergistic effects exploited in multimodal datasets/models leveraging semantics from the language models to facilitate image and video data understanding.

Research in Robotics encompasses fundamental challenges in developing systems that can interact with the environments either through manipulation and/or navigation. While some research problems can be quite specific, in a more general setting of Cognitive robotics/systems, vision and language need to be brought together with robotics to enable perception-action cycle.

Overall, this organisation of the researchers around the theme “Perception, Language, Action” creates a collaborative network that is of interest to all involved, exchanging the experiences and sharing the common tools (especially related to large generative models, data and techniques) and developing larger projects that require expertise that go beyond the narrow domains.

The group has proved to be an important player in AI-driven multidisciplinary initiatives globally, as most of such projects require AI/ML techniques that involve perception, language and/or action.

Perception, Language, Action

Our team