TY - JOUR AU - Salik, Khwaja Mohd. AU - Aggarwal, Swati AU - Kumar, Yaman AU - Shah, Rajiv Ratn AU - Jain, Rohit AU - Zimmermann, Roger PY - 2019/07/17 Y2 - 2024/03/28 TI - Lipper: Speaker Independent Speech Synthesis Using Multi-View Lipreading JF - Proceedings of the AAAI Conference on Artificial Intelligence JA - AAAI VL - 33 IS - 01 SE - Student Abstract Track DO - 10.1609/aaai.v33i01.330110023 UR - https://ojs.aaai.org/index.php/AAAI/article/view/5148 SP - 10023-10024 AB - <p>Lipreading is the process of understanding and interpreting speech by observing a speaker’s lip movements. In the past, most of the work in lipreading has been limited to <em>classifying</em> silent videos to a <em>fixed</em> number of text classes. However, this limits the applications of the lipreading since human language cannot be bound to a fixed set of words or languages. The aim of this work is to reconstruct intelligible acoustic speech signals from silent videos from various poses of a person which Lipper has never seen before. Lipper, therefore is a <strong>vocabulary and language agnostic, speaker independent and a near real-time model that deals with a variety of poses of a speaker</strong>. The model leverages silent video feeds from multiple cameras recording a subject to generate intelligent speech of a speaker. It uses a deep learning based STCNN+BiGRU architecture to achieve this goal. We evaluate speech reconstruction for speaker independent scenarios and demonstrate the speech output by overlaying the audios reconstructed by Lipper on the corresponding videos.</p> ER -