Description

This course offers a theoretical and practical understanding of how human speech is processed using computers. Speech Processing lies at the intersection of acoustic phonetics, digital processing of speech signals, and Machine Learning. Knowledge of these domains is essential to developing a thorough understanding of the rapidly developing fields of speech recognition (speech-to-text), speech synthesis (text-to-speech), spoken dialog systems, and chatbots (e.g. Siri, Alexa, Cortana). Students will learn about the processes underlying human speech production, perception, and techniques for speech analysis and synthesis. Delivered concepts will be reinforced through rigorous programming assignments, where students will implement their own speech analysis and classification systems from scratch. As projects, students will develop local language speech recognition and synthesis systems using state-of-the-art toolkits. This course lays the foundation for advanced courses and research on speech processing.

People

Dr. Agha Ali Raza

Instructor

Haris Bin Zia

Teaching Assistant

Hira Dhamyal

Teaching Assistant

Dyass Khalid

Teaching Assistant

M. Rahim Khan

Teaching Assistant

Course Objectives

The goal of this course is to get the students excited about Speech Processing and to develop an understanding of:

  • the acoustics of speech signals and corresponding articulatory details

  • time and frequency based digital analysis of the speech signals

  • Machine Learning based robust, scalable, and adaptive speech processing

Learning Outcomes

By the end of the course, students should:

  • Understand the processes of human speech generation, transmission, and perception, and the mathematical models describing these physical processes.

  • Develop a theoretical and practical (basic) understanding of the relevant branches of linguistics (articulatory and acoustic phonetics, and phonology), and signal processing (time and frequency-based analyses)

  • Understand the Speech Processing pipeline from the design and collection of speech corpora, various feature extraction techniques, rule-based and Machine Learning based processing models, and appropriate evaluation techniques

  • Develop a hands-on understanding of time and frequency-based speech processing techniques, Speech Recognition, and Speech Synthesis

  • Gain hands-on experience with tools including Praat and audacity, languages including Matlab and Python, and toolkits including Sphinx and/or Kaldi

Material

Textbooks

  • A course in Phonetics by Ladefoged (ACP), 2005

  • The Acoustics of Speech Communication by Pickett (ASC), 1998

  • Digital Speech Processing by Rabiner and Schafer (DSP), 1978

  • Speech and Language Processing by Jurafsky and Martin (SLP), Ed 2 and 3


Reference books

  • Theory and Applications of Digital Speech Processing by Rabiner and Schafer, 2010

  • Principles of Computer Speech by Witten, 1983

  • Introduction to Text to Speech Synthesis by Dutoit, 2001

  • Fundamentals of Speech Recognition by Rabiner and Juang, 1993

Course Overview

Lectures

Assignments

Acknowledgements

Our special thanks to Professors Roni Rosenfeld, Kilian Weinberger, Andrew Ng, Sarmad Hussain, Dan Jurafsky, James H. Martin, Christopher Manning, and Victor Levrenko, whose Machine Learning, Natural Language Processing, and Speech Processing courses inspired the contents of several lectures in this series.

We would also like to express our gratitude towards Kalid Azad (Better Explained), Joshua Starmer (StatQuest), and Grant Sanderson (3blue1brown), as their amazing educational videos motivated and simplified several complex explanations in this course.