Audio-Visual Speech Processing for Robust Human-Computer Interaction

04.10.2007

Speaker : Gerasimos Potamianos, <i>Human Language Technologies, IBM T.J. Watson Research Center, Yorktown Heights, NY, US</i>
Date : 04.10.2007
Time: 11:00-12:30
Location : "Mediterranean Studies" Seminar Room, FORTH. Heraklion, Crete.
Host : Y. Stylianou

Abstract:

This talk will be structured in two parts. In the first half I will provide an overview of the activities in my group. The main emphasis will be placed on recent work conducted as part of FP6 EU projects, in particular integrated project CHIL - "Computers in the Human Interaction Loop". CHIL is a technology driven project that aims to develop robust audio-visual perception technologies of human interaction during meetings and lectures inside smart rooms.

The second part of the talk will delve more deeply into a specific class of audio-visual perceptual technologies, namely the problem of audio-visual speech processing with emphasis on automatic bimodal speech recognition. This line of work aims to exploit visual speeech information to improve speech recognition robustness in noisy environments, in a process akin to human lipreading. I will discuss in detail my work in this field, with emphasis on visual feature extraction in realistic environments and ongoing research in the area of audio-visual fusion.

Bio:

Gerasimos (Makis) Potamianos received the Diploma degree in Electrical and Computer Engineering from the National Technical University of Athens, Greece in 1988, and the M.S.E. and Ph.D. degrees in Electrical and Computer Engineering from the Johns Hopkins University, Baltimore, Maryland, in 1990 and 1994, respectively.

His thesis work has focused on statistical models for image processing. During 1994-1996 he has been a Postdoctoral Fellow with the Center for Language and Speech Processing, and from 1996 to 1999 a Senior Member of Technical Staff with the Speech and Image Processing Services Laboratory at AT&T Labs-Research. In 1999, he joined the Human Language Technologies department at the IBM Thomas J. Watson Research Center as a Research Staff Member, where he is currently manager of the Multimodal Conversational Technologies Department.

Makis' research interests span the areas of multimodal speech processing and human-computer interaction with particular emphasis on audio-visual speech processing, automatic speech recognition, multimedia signal processing and fusion, as well as computer vision for human detection and tracking.

Makis has published over 70 articles in these areas that have received over 400 citations and has a number of patents granted. He is a member of IEEE and a member of the Technical Chamber of Greece.

More info can be found in http://www.research.ibm.com/people/g/gerasimos.potamianos.

Search form

Audio-Visual Speech Processing for Robust Human-Computer Interaction