Speech Recognition by Synthesis (SRbS)

Recent progress in reducing error rates for automatic speech recognition (ASR) has mainly been achieved by the use of machine learning algorithms, involving models with ever-more parameters.  Robust training of these models requires vast amounts of training data and is computationally expensive.  The data structures learned by these models, such as deep neural networks and hidden Markov models with high-dimensional Gaussian mixture model output distributions, are often not understood in any detail, and are difficult to relate in any meaningful way to the underlying mechanisms of speech production.

In this project we are developing new parsimonious models for robust speech recognition, inspired by linguistically and physically plausible models of speech production.  Desirable characteristics of these models are to be robust to speech in different environments and contexts and to require a minimal set of meaningful parameters.  Such models will perhaps sit at the interface between speech articulation, synthesis and recognition.

Recent Activity

September 6-10 2015: (Photos below) Several members of the group attended and presented papers and posters at Interspeech 2015 in Dresden, Germany.

July 2-3 2015: (Photos below) Several members of the group recently attended UK Speech 2015 at the University of East Anglia, Norwich, UK and presented several posters and a talk.

Seminars

We hold regular seminars given by internal and external speakers.  The current seminar schedule for 2016 is here.

Academic Staff

Research Staff

Doctoral Researchers

Publications

Journal Papers

C. Champion and S.M. Houghton.  Application of Continuous State Hidden Markov Models to a Classical Problem in Speech Recognition, Computer Speech and Language, 36(1):347–364, 2016 (DOI).

Refereed Conference Papers

P. Weber, L. Bai, S. M. Houghton, P. Jancovic and M. J. Russell.  Progress on Phoneme Recognition with a Continuous-State HMM.  Accepted to ICASSP 2016, Shanghai, 2016 (Preprint PDF).

Linxue Bai, P. Jancovic, M.J. Russell and P. Weber.  Analysis of a Low-Dimensional Bottleneck Neural Network Representation of Speech for Modelling Speech Dynamics.  In Proc. Interspeech 2015, pp583-587, Dresden, Germany, 2015 (Preprint PDF).

S. M. Houghton and C. J. Champion.  Inductive Implementation of Segmental HMMs as CS-HMMs.  In Proc. Interspeech 2015, pp776-780, Dresden, Germany, 2015 (Preprint PDF).

S. M. Houghton, C. J. Champion and P. Weber.  Recognition of Voiced Sounds with a Continuous State HMM.  In Proc. Interspeech 2015, pp523-527 Dresden, Germany, 2015 (Preprint PDF).

P. Weber, C. Champion, S. Houghton, P. Jancovic and M.J. Russell.  Consonant Recognition with Continuous-State Hidden Markov Models and Perceptually-Motivated Features.  In Proc. Interspeech 2015, pp1893-1897, Dresden, Germany, 2015 (Preprint PDF).

P. Weber, S.M. Houghton, C.J. Champion, M.J. Russell and P. Jancovic.  Trajectory Analysis of Speech Using Continuous State Hidden Markov Models.  In Proc. ICASSP 2014, pp3042-3046, Florence, Italy, 2014 (DOI).

Other Papers

Linxue Bai, P. Jancovic, M.J. Russell and P. Weber.  Analysis of a Low-Dimensional Bottleneck Neural Network Representation of Speech for Modelling Speech Dynamics.  Poster presented at UK Speech, University of East Anglia, Norwich, UK 2015 (PDF).

C. A. Seivwright, M. J. Russell and S. M. Houghton, Comparative Analysis of a Continuous and Discontinuous Piecewise Linear Decoder.  Poster presented at UK Speech, University of East Anglia, Norwich, UK, 2015 (PDF).

C. Champion.  Recognition by rule 3: Application of Continuous State Hidden Markov Models to a Classical Problem in Speech RecognitionUnpublished internal paper, 2011 (PDF).

Photos

Some photos of the group from Interspeech 2015
SRBS201509  EF201509
 PW201509  SH201509
 LB201509  
Some photos of the group from UK Speech 2015
 Peter UK Speech  Phil UK Speech
 Eva UK Speech  Chloe UK Speech
 

Mengie UK Speech