Speech processing
with Agha Ali Raza
CS5318/CS433/EE415 Speech Processing
Course Description
_________________
This course offers a theoretical and practical understanding of how human speech is processed using computers. Speech Processing lies at the intersection of acoustic phonetics, digital processing of speech signals, and Machine Learning. Knowledge of these domains is essential to developing a thorough understanding of the rapidly developing fields of speech recognition (speech-to-text), speech synthesis (text-to-speech), spoken dialog systems, and chatbots (e.g. Siri, Alexa, Cortana). Students will learn about the processes underlying human speech production, perception, and techniques for speech analysis and synthesis. Delivered concepts will be reinforced through rigorous programming assignments, where students will implement their own speech analysis and classification systems from scratch. As projects, students will develop local language speech recognition and synthesis systems using state-of-the-art toolkits. This course lays the foundation for advanced courses and research on speech processing.
Course Objectives
________________
The goal of this course is to get the students excited about Speech Processing and to develop an understanding of:
the acoustics of speech signals and corresponding articulatory details
time and frequency based digital analysis of the speech signals
Machine Learning based robust, scalable, and adaptive speech processing
Course Objectives
________________
By the end of the course, students should:
Understand the processes of human speech generation, transmission, and perception, and the mathematical models describing these physical processes.
Develop a theoretical and practical (basic) understanding of the relevant branches of linguistics (articulatory and acoustic phonetics, and phonology), and signal processing (time and frequency-based analyses)
Understand the Speech Processing pipeline from the design and collection of speech corpora, various feature extraction techniques, rule-based and Machine Learning based processing models, and appropriate evaluation techniques
Develop a hands-on understanding of time and frequency-based speech processing techniques, Speech Recognition, and Speech Synthesis
Gain hands-on experience with tools including Praat and audacity, languages including Matlab and Python, and toolkits including Sphinx and/or Kaldi
Course Outline
______________
Material
________
Textbooks
A course in Phonetics by Ladefoged (ACP), 2005
The Acoustics of Speech Communication by Pickett (ASC), 1998
Digital Speech Processing by Rabiner and Schafer (DSP), 1978
Speech and Language Processing by Jurafsky and Martin (SLP), Ed 2 and 3
Reference books
Theory and Applications of Digital Speech Processing by Rabiner and Schafer, 2010
Principles of Computer Speech by Witten, 1983
Introduction to Text to Speech Synthesis by Dutoit, 2001
Fundamentals of Speech Recognition by Rabiner and Juang, 1993
Lectures
________
Assignments
____________
Spring 2021
______________
Course Staff
___________
Dr. Agha Ali Raza
Meet our experts!
This course has been meticulously designed by Dr. Agha Ali Raza and his team of proficient teaching assistants to introduce interested students to the fundamentals of speech processing. Apart from foundational theoretical concepts, hands-on programming assignments allow students to gain a deeper practical understanding of the speech processing pipeline. This encompasses aspects such as designing and assembling speech corpora, employing techniques for feature extraction, utilizing both rule-based and Machine Learning-based processing models, and applying suitable evaluation methodologies.
Haris Bin Zia
Hira Dhamyal
Dyass Khalid
M. Rahim Khan
Taimoor Arif
Hamza Farooq
Ahmed Hassaan
Sheza Munir
Danyal Maqbool
Acknowledgements
__________________
Our special thanks to Professors Roni Rosenfeld, Kilian Weinberger, Andrew Ng, Sarmad Hussain, Dan Jurafsky, James H. Martin, Christopher Manning, and Victor Levrenko, whose Machine Learning, Natural Language Processing, and Speech Processing courses inspired the contents of several lectures in this series.
We would also like to express our gratitude towards Kalid Azad (Better Explained), Joshua Starmer (StatQuest), and Grant Sanderson (3blue1brown), as their amazing educational videos motivated and simplified several complex explanations in this course.