Speech processing

CS5318/CS433/EE415 Speech Processing

Syed Babar Ali School of Science and Engineering (SBASSE)

Lahore University of Management Sciences (LUMS)

Course Description

_________________

This course offers a theoretical and practical understanding of how human speech is processed using computers. Speech Processing lies at the intersection of acoustic phonetics, digital processing of speech signals, and Machine Learning. Knowledge of these domains is essential to developing a thorough understanding of the rapidly developing fields of speech recognition (speech-to-text), speech synthesis (text-to-speech), spoken dialog systems, and chatbots (e.g. Siri, Alexa, Cortana). Students will learn about the processes underlying human speech production, perception, and techniques for speech analysis and synthesis. Delivered concepts will be reinforced through rigorous programming assignments, where students will implement their own speech analysis and classification systems from scratch. As projects, students will develop local language speech recognition and synthesis systems using state-of-the-art toolkits. This course lays the foundation for advanced courses and research on speech processing.

Course Objectives

________________

The goal of this course is to get the students excited about Speech Processing and to develop an understanding of:

the acoustics of speech signals and corresponding articulatory details
time and frequency based digital analysis of the speech signals
Machine Learning based robust, scalable, and adaptive speech processing

Course Objectives

________________

By the end of the course, students should:

Understand the processes of human speech generation, transmission, and perception, and the mathematical models describing these physical processes.
Develop a theoretical and practical (basic) understanding of the relevant branches of linguistics (articulatory and acoustic phonetics, and phonology), and signal processing (time and frequency-based analyses)
Understand the Speech Processing pipeline from the design and collection of speech corpora, various feature extraction techniques, rule-based and Machine Learning based processing models, and appropriate evaluation techniques
Develop a hands-on understanding of time and frequency-based speech processing techniques, Speech Recognition, and Speech Synthesis
Gain hands-on experience with tools including Praat and audacity, languages including Matlab and Python, and toolkits including Sphinx and/or Kaldi

Course Outline

______________

Material

________

Textbooks

A course in Phonetics by Ladefoged (ACP), 2005
The Acoustics of Speech Communication by Pickett (ASC), 1998
Digital Speech Processing by Rabiner and Schafer (DSP), 1978
Speech and Language Processing by Jurafsky and Martin (SLP), Ed 2 and 3

Reference books

Theory and Applications of Digital Speech Processing by Rabiner and Schafer, 2010
Principles of Computer Speech by Witten, 1983
Introduction to Text to Speech Synthesis by Dutoit, 2001
Fundamentals of Speech Recognition by Rabiner and Juang, 1993

Lectures

________

Assignments

____________

Spring 2021

______________

Course Staff

___________

Dr. Agha Ali Raza

Meet our experts!

This course has been meticulously designed by Dr. Agha Ali Raza and his team of proficient teaching assistants to introduce interested students to the fundamentals of speech processing. Apart from foundational theoretical concepts, hands-on programming assignments allow students to gain a deeper practical understanding of the speech processing pipeline. This encompasses aspects such as designing and assembling speech corpora, employing techniques for feature extraction, utilizing both rule-based and Machine Learning-based processing models, and applying suitable evaluation methodologies.

Haris Bin Zia

Hira Dhamyal

Dyass Khalid

M. Rahim Khan

Taimoor Arif

Hamza Farooq

Ahmed Hassaan

Sheza Munir

Danyal Maqbool

Acknowledgements

__________________

Our special thanks to Professors Roni Rosenfeld, Kilian Weinberger, Andrew Ng, Sarmad Hussain, Dan Jurafsky, James H. Martin, Christopher Manning, and Victor Levrenko, whose Machine Learning, Natural Language Processing, and Speech Processing courses inspired the contents of several lectures in this series.

We would also like to express our gratitude towards Kalid Azad (Better Explained), Joshua Starmer (StatQuest), and Grant Sanderson (3blue1brown), as their amazing educational videos motivated and simplified several complex explanations in this course.

Google Sites

Report abuse