Multimodal Interaction

Department of Electronic & Electrical Engineering, School of Electronic, Electrical and Computer Engineering

College of Engineering and Physical Sciences

Details

Code 21495

Level of study Third/Final year

Credit value 10

Semester 1

Module description

Although speech is the most powerful channel for human-human communication, in face-to-face situations it is often supplemented with gesture. A participant in a conversation may also be aware of the other's direction of gaze, lip movements or emotional state, and these can all contribute to some extent to achieving a particular communicative goal more effectively. It is reasonable to assume that, in the future, human-machine communication will also benefit from multimodal interaction. The goals of this course are to look at the role of multimodality in human-machine interaction, to survey the technologies which are available to capture multimodal data, and to understand the methods which can be used to classify this data and to combine information in the different streams to obtain the best interpretation of a user's intent. The course will also look at the role of emotion in human-machine interaction. Multimodal human machine interaction is an active research topic worldwide, and many of the issues considered in the course are 'snapshots' of current research.



Outline syllabus:


  • Introduction to multimodal interaction.

  • Introduction to basic pattern recognition.

  • Models of human-human and human-machine dialogue.

  • Automatic speech recognition.

  • Lip shape recognition and audio-visual integration.

  • Introduction to data Fusion.

  • Gaze/Eye-movement: technologies for measuring eye-movement, classification of eye movements; integration of gaze and speech.

  • Gesture: technologies for measuring 3D body motion; classification of gesture; integration of gesture with other modalities.

  • Emotion: The significance of emotion in communication; classification of emotion; recognition of emotional speech (i.e. ASR for emotional speech); recognition of emotion in speech and facial images; synthesis of emotional speech and facial images.