Multimodal Signal Processing book cover

Multimodal Signal Processing

Theory and applications for human-computer interaction

  • Presents state-of-art methods for multimodal signal processing, analysis, and modeling
  • Contains numerous examples of systems with different modalities combined
  • Describes advanced applications in multimodal Human-Computer Interaction (HCI) as well as in computer-based analysis and modelling of multimodal human-human communication scenes.

Multimodal signal processing is an important research and development field that processes signals and combines information from a variety of modalities – speech, vision, language, text – which significantly enhance the understanding, modelling, and performance of human-computer interaction devices or systems enhancing human-human communication. The overarching theme of this book is the application of signal processing and statistical machine learning techniques to problems arising in this multi-disciplinary field. It describes the capabilities and limitations of current technologies, and discusses the technical challenges that must be overcome to develop efficient and user-friendly multimodal interactive systems.

With contributions from the leading experts in the field, the present book should serve as a reference in multimodal signal processing for signal processing researchers, graduate students, R&D engineers, and computer engineers who are interested in this emerging field.

Signal, acoustic, speech, image and video processing university (applied) researchers, R&D engineers, computer engineers

Hardbound, 352 Pages

Published: November 2009

Imprint: Academic Press

ISBN: 978-0-12-374825-6


  • 1. Introduction
    Jean-Philippe Thiran, Ferran Marqués, and Hervé Bourlard

    Part I -- Signal Processing, Modelling and Related Mathematical Tools

    2. Statistical Machine Learning for HCI
    Samy Bengio

    2.1. Introduction
    2.2. Introduction to Statistical Learning
    2.3. Support Vector Machines for Binary Classification
    2.4. Hidden Markov Models for Speech Recognition
    2.5. Conclusion

    3. Speech Processing
    Thierry Dutoit and Stéphane Dupont

    3.1. Introduction
    3.2. Speech Recognition
    3.3. Speaker Recognition
    3.4. Text-to-Speech Synthesis
    3.5. Conclusions

    4. Natural Language and Dialogue Processing
    Olivier Pietquin

    4.1. Introduction
    4.2. Natural Language Understanding
    4.3. Natural Language Generation
    4.4. Dialogue Processing
    4.5. Conclusion

    5. Image and Video Processing Tools for HCI
    Montse Pardàs, Verónica Vilaplana and Cristian Canton-Ferrer

    5.1. Introduction
    5.2. Face Analysis
    5.3. Hand-Gesture Analysis
    5.4. Head Orientation Analysis and FoA Estimation
    5.5. Body Gesture Analysis
    5.6. Conclusions

    6. Processing of Handwriting and Sketching Dynamics
    Claus Vielhauer

    6.1. Introduction
    6.2. History of Handwriting Modality and the Acquisition of Online Handwriting Signals
    6.3. Basics in Acquisition, Examples for Sensors
    6.4. Analysis of Online Handwriting and Sketching Signals
    6.5. Overview of Recognition Goals in HCI
    6.6. Sketch Recognition for User Interface Design
    6.7. Similarity Search in Digital Ink
    6.8. Summary and Perspectives for Handwriting and Sketching in HCI

    Part II -- Multimodal Signal Processing and Modelling

    7. Basic Concepts of Multimodal Analysis
    Mihai Gurban and Jean-Philippe Thiran

    7.1. Defining Multimodality
    7.2. Advantages of Multimodal Analysis
    7.3. Conclusion

    8. Multimodal Information Fusion
    Norman Poh and Josef Kittler

    8.1. Introduction
    8.2. Levels of Fusion
    8.3. Adaptive versus Non-Adaptive Fusion
    8.4. Other Design Issues
    8.5. Conclusions

    9. Modality Integration Methods
    Mihai Gurban and Jean-Philippe Thiran

    9.1. Introduction
    9.2. Multimodal Fusion for AVSR
    9.3. Multimodal Speaker Localisation
    9.4. Conclusion

    10. A Multimodal Recognition Framework for Joint Modality Compensation and Fusion
    Konstantinos Moustakas, Savvas Argyropoulos and Dimitrios Tzovaras

    10.1. Introduction
    10.2. Joint Modality Recognition and Applications
    10.3. A New Joint Modality Recognition Scheme
    10.4. Joint Modality Audio-Visual Speech Recognition
    10.5. Joint Modality Recognition in Biometrics
    10.6. Conclusions

    11. Managing Multimodal Data, Metadata and Annotations: Challenges and Solutions
    Andrei Popescu-Belis

    11.1. Introduction
    11.2. Setting the Stage: Concepts and Projects
    11.3. Capturing and Recording Multimodal Data
    11.4. Reference Metadata and Annotations
    11.5. Data Storage and Access
    11.6. Conclusions and Perspectives

    Part III -- Multimodal Human-Computer and Human-to-Human Interaction

    12. Multimodal Input
    Natalie Ruiz, Fang Chen, and Sharon Oviatt

    12.1. Introduction
    12.2. Advantages of Multimodal Input Interfaces
    12.3. Multimodality, Cognition and Performance
    12.4. Understanding Multimodal Input Behaviour
    12.5. Adaptive Multimodal Interfaces
    12.6. Conclusions and Future Directions

    13. Multimodal HCI Output: Facial Motion, Gestures and Synthesised Speech Synchronisation
    Igor S. Pandžic

    13.1. Introduction
    13.2. Basic AV Speech Synthesis
    13.3. The Animation System
    13.4. Coarticulation
    13.5. Extended AV Speech Synthesis
    13.6. Embodied Conversational Agents
    13.7. T TS Timing Issues
    13.8. Conclusion

    14. Interactive Representations of Multimodal Databases
    Stéphane Marchand-Maillet, Donn Morrison, Enikö Szekely, and Eric Bruno

    14.1. Introduction
    14.2. Multimodal Data Representation
    14.3. Multimodal Data Access
    14.4. Gaining Semantic from User Interaction
    14.5. Conclusion and Discussion

    15. Modelling Interest in Face-to-Face Conversations from Multimodal Nonverbal Behaviour
    Daniel Gatica-Perez

    15.1. Introduction
    15.2. Perspectives on Interest Modelling
    15.3. Computing Interest from Audio Cues
    15.4. Computing Interest from Multimodal Cues
    15.5. Other Concepts Related to Interest
    15.6. Concluding Remarks


advert image