Speech processing

Course Description


This course offers a theoretical and practical understanding of how human speech is processed using computers. Speech Processing lies at the intersection of acoustic phonetics, digital processing of speech signals, and Machine Learning. Knowledge of these domains is essential to developing a thorough understanding of the rapidly developing fields of speech recognition (speech-to-text), speech synthesis (text-to-speech), spoken dialog systems, and chatbots (e.g. Siri, Alexa, Cortana). Students will learn about the processes underlying human speech production, perception, and techniques for speech analysis and synthesis. Delivered concepts will be reinforced through rigorous programming assignments, where students will implement their own speech analysis and classification systems from scratch. As projects, students will develop local language speech recognition and synthesis systems using state-of-the-art toolkits. This course lays the foundation for advanced courses and research on speech processing.

Course Objectives


The goal of this course is to get the students excited about Speech Processing and to develop an understanding of:

Course Objectives


By the end of the course, students should:


Course Outline






Reference books







Spring 2021



Course Staff


Dr. Agha Ali Raza

Meet our experts!

This course has been meticulously designed by Dr. Agha Ali Raza and his team of proficient teaching assistants to introduce interested students to the fundamentals of speech processing. Apart from foundational theoretical concepts, hands-on programming assignments allow students to gain a deeper practical understanding of the speech processing pipeline. This encompasses aspects such as designing and assembling speech corpora, employing techniques for feature extraction, utilizing both rule-based and Machine Learning-based processing models, and applying suitable evaluation methodologies. 

Haris Bin Zia

Hira Dhamyal

Dyass Khalid

M. Rahim Khan

Taimoor Arif

Hamza Farooq

Ahmed Hassaan

Sheza Munir

Danyal Maqbool




Our special thanks to Professors Roni Rosenfeld, Kilian Weinberger, Andrew Ng, Sarmad Hussain, Dan Jurafsky, James H. Martin, Christopher Manning, and Victor Levrenko, whose Machine Learning, Natural Language Processing, and Speech Processing courses inspired the contents of several lectures in this series. 

We would also like to express our gratitude towards Kalid Azad (Better Explained), Joshua Starmer (StatQuest), and Grant Sanderson (3blue1brown), as their amazing educational videos motivated and simplified several complex explanations in this course.