Multimodal Signal Processing
Theory and applications for human-computer interaction
Edited by- Jean-Philippe Thiran, EPFL, Lausanne, Switzerland
- Ferran Marqués, Technical University of Catalonia, Spain
- Hervé Bourlard, Director, IDIAP Research Institute, EPFL, Lausanne, Switzerland
- Presents state-of-art methods for multimodal signal processing, analysis, and modeling
- Contains numerous examples of systems with different modalities combined
- Describes advanced applications in multimodal Human-Computer Interaction (HCI) as well as in computer-based analysis and modelling of multimodal human-human communication scenes.
Multimodal signal processing is an important research and development field that processes signals and combines information from a variety of modalities speech, vision, language, text which significantly enhance the understanding, modelling, and performance of human-computer interaction devices or systems enhancing human-human communication. The overarching theme of this book is the application of signal processing and statistical machine learning techniques to problems arising in this multi-disciplinary field. It describes the capabilities and limitations of current technologies, and discusses the technical challenges that must be overcome to develop efficient and user-friendly multimodal interactive systems.
With contributions from the leading experts in the field, the present book should serve as a reference in multimodal signal processing for signal processing researchers, graduate students, R&D engineers, and computer engineers who are interested in this emerging field.
Audience
Signal, acoustic, speech, image and video processing university (applied) researchers, R&D engineers, computer engineers
Hardbound, 352 Pages
Published: November 2009
Imprint: Academic Press
ISBN: 978-0-12-374825-6
Contents
1. Introduction
Jean-Philippe Thiran, Ferran Marqués, and Hervé BourlardPart I -- Signal Processing, Modelling and Related Mathematical Tools
2. Statistical Machine Learning for HCI
2.1. Introduction
Samy Bengio
2.2. Introduction to Statistical Learning
2.3. Support Vector Machines for Binary Classification
2.4. Hidden Markov Models for Speech Recognition
2.5. Conclusion3. Speech Processing
3.1. Introduction
Thierry Dutoit and Stéphane Dupont
3.2. Speech Recognition
3.3. Speaker Recognition
3.4. Text-to-Speech Synthesis
3.5. Conclusions4. Natural Language and Dialogue Processing
4.1. Introduction
Olivier Pietquin
4.2. Natural Language Understanding
4.3. Natural Language Generation
4.4. Dialogue Processing
4.5. Conclusion5. Image and Video Processing Tools for HCI
5.1. Introduction
Montse Pardàs, Verónica Vilaplana and Cristian Canton-Ferrer
5.2. Face Analysis
5.3. Hand-Gesture Analysis
5.4. Head Orientation Analysis and FoA Estimation
5.5. Body Gesture Analysis
5.6. Conclusions6. Processing of Handwriting and Sketching Dynamics
6.1. Introduction
Claus Vielhauer
6.2. History of Handwriting Modality and the Acquisition of Online Handwriting Signals
6.3. Basics in Acquisition, Examples for Sensors
6.4. Analysis of Online Handwriting and Sketching Signals
6.5. Overview of Recognition Goals in HCI
6.6. Sketch Recognition for User Interface Design
6.7. Similarity Search in Digital Ink
6.8. Summary and Perspectives for Handwriting and Sketching in HCIPart II -- Multimodal Signal Processing and Modelling
7. Basic Concepts of Multimodal Analysis
Mihai Gurban and Jean-Philippe Thiran7.1. Defining Multimodality
8. Multimodal Information Fusion
7.2. Advantages of Multimodal Analysis
7.3. Conclusion
Norman Poh and Josef Kittler8.1. Introduction
9. Modality Integration Methods
8.2. Levels of Fusion
8.3. Adaptive versus Non-Adaptive Fusion
8.4. Other Design Issues
8.5. Conclusions
Mihai Gurban and Jean-Philippe Thiran9.1. Introduction
10. A Multimodal Recognition Framework for Joint Modality Compensation and Fusion
9.2. Multimodal Fusion for AVSR
9.3. Multimodal Speaker Localisation
9.4. Conclusion
Konstantinos Moustakas, Savvas Argyropoulos and Dimitrios Tzovaras10.1. Introduction
11. Managing Multimodal Data, Metadata and Annotations: Challenges and Solutions
10.2. Joint Modality Recognition and Applications
10.3. A New Joint Modality Recognition Scheme
10.4. Joint Modality Audio-Visual Speech Recognition
10.5. Joint Modality Recognition in Biometrics
10.6. Conclusions
Andrei Popescu-Belis11.1. Introduction
Part III -- Multimodal Human-Computer and Human-to-Human Interaction
11.2. Setting the Stage: Concepts and Projects
11.3. Capturing and Recording Multimodal Data
11.4. Reference Metadata and Annotations
11.5. Data Storage and Access
11.6. Conclusions and Perspectives12. Multimodal Input
12.1. Introduction
Natalie Ruiz, Fang Chen, and Sharon Oviatt
12.2. Advantages of Multimodal Input Interfaces
12.3. Multimodality, Cognition and Performance
12.4. Understanding Multimodal Input Behaviour
12.5. Adaptive Multimodal Interfaces
12.6. Conclusions and Future Directions13. Multimodal HCI Output: Facial Motion, Gestures and Synthesised Speech Synchronisation
13.1. Introduction
Igor S. Pandic
13.2. Basic AV Speech Synthesis
13.3. The Animation System
13.4. Coarticulation
13.5. Extended AV Speech Synthesis
13.6. Embodied Conversational Agents
13.7. T TS Timing Issues
13.8. Conclusion14. Interactive Representations of Multimodal Databases
14.1. Introduction
Stéphane Marchand-Maillet, Donn Morrison, Enikö Szekely, and Eric Bruno
14.2. Multimodal Data Representation
14.3. Multimodal Data Access
14.4. Gaining Semantic from User Interaction
14.5. Conclusion and Discussion15. Modelling Interest in Face-to-Face Conversations from Multimodal Nonverbal Behaviour
15.1. Introduction
Daniel Gatica-Perez
15.2. Perspectives on Interest Modelling
15.3. Computing Interest from Audio Cues
15.4. Computing Interest from Multimodal Cues
15.5. Other Concepts Related to Interest
15.6. Concluding Remarks

