Machine Learning - 1st Edition - ISBN: 9780128015223, 9780128017227

Machine Learning

1st Edition

A Bayesian and Optimization Perspective

Authors: Sergios Theodoridis
eBook ISBN: 9780128017227
Hardcover ISBN: 9780128015223
Imprint: Academic Press
Published Date: 27th March 2015
Page Count: 1062
Tax/VAT will be calculated at check-out Price includes VAT (GST)
99.95
60.99
71.95
Unavailable
Price includes VAT (GST)
× DRM-Free

Easy - Download and start reading immediately. There’s no activation process to access eBooks; all eBooks are fully searchable, and enabled for copying, pasting, and printing.

Flexible - Read on multiple operating systems and devices. Easily read eBooks on smart phones, computers, or any eBook readers, including Kindle.

Open - Buy once, receive and download all available eBook formats, including PDF, EPUB, and Mobi (for Kindle).

Institutional Access

Secure Checkout

Personal information is secured with SSL technology.

Free Shipping

Free global shipping
No minimum order.

Description

This tutorial text gives a unifying perspective on machine learning by covering both probabilistic and deterministic approaches -which are based on optimization techniques – together with the Bayesian inference approach, whose essence lies in the use of a hierarchy of probabilistic models.

The book presents the major machine learning methods as they have been developed in different disciplines, such as statistics, statistical and adaptive signal processing and computer science. Focusing on the physical reasoning behind the mathematics, all the various methods and techniques are explained in depth, supported by examples and problems, giving an invaluable resource to the student and researcher for understanding and applying machine learning concepts.

The book builds carefully from the basic classical methods  to  the most recent trends, with chapters written to be as self-contained as possible, making the text suitable for  different courses: pattern recognition, statistical/adaptive signal processing, statistical/Bayesian learning, as well as short courses on sparse modeling, deep learning, and probabilistic graphical models.

Key Features

  • All major classical techniques: Mean/Least-Squares regression and filtering, Kalman filtering, stochastic approximation and online learning, Bayesian classification, decision trees, logistic regression and boosting methods.
  • The latest trends: Sparsity, convex analysis and optimization, online distributed algorithms, learning in RKH spaces, Bayesian inference, graphical and hidden Markov models, particle filtering, deep learning, dictionary learning and latent variables modeling.
  • Case studies - protein folding prediction, optical character recognition, text authorship identification, fMRI data analysis, change point detection, hyperspectral image unmixing, target localization, channel equalization and echo cancellation, show how the theory can be applied.
  • MATLAB code for all the main algorithms are available on an accompanying website, enabling the reader to experiment with the code.

Readership

University Researchers, R&D engineers, graduate students taking a machine learning course.

Table of Contents

  • Preface
  • Acknowledgments
  • Notation
  • Dedication
  • Chapter 1: Introduction
    • Abstract
    • 1.1 What Machine Learning is About
    • 1.2 Structure and a Road Map of the Book
  • Chapter 2: Probability and Stochastic Processes
    • Abstract
    • 2.1 Introduction
    • 2.2 Probability and Random Variables
    • 2.3 Examples of Distributions
    • 2.4 Stochastic Processes
    • 2.5 Information Theory
    • 2.6 Stochastic Convergence
    • Problems
  • Chapter 3: Learning in Parametric Modeling: Basic Concepts and Directions
    • Abstract
    • 3.1 Introduction
    • 3.2 Parameter Estimation: The Deterministic Point of View
    • 3.3 Linear Regression
    • 3.4 Classification
    • 3.5 Biased Versus Unbiased Estimation
    • 3.6 The Cramér-Rao Lower Bound
    • 3.7 Sufficient Statistic
    • 3.8 Regularization
    • 3.9 The Bias-Variance Dilemma
    • 3.10 Maximum Likelihood Method
    • 3.11 Bayesian Inference
    • 3.12 Curse of Dimensionality
    • 3.13 Validation
    • 3.14 Expected and Empirical Loss Functions
    • 3.15 Nonparametric Modeling and Estimation
    • Problems
  • Chapter 4: Mean-Square Error Linear Estimation
    • Abstract
    • 4.1 Introduction
    • 4.2 Mean-Square Error Linear Estimation: The Normal Equations
  • Chapter 5: Stochastic Gradient Descent: The LMS Algorithm and its Family
    • Abstract
    • 5.1 Introduction
    • 5.2 The Steepest Descent Method
    • 5.3 Application to the Mean-Square Error Cost Function
    • 5.4 Stochastic Approximation
    • 5.5 The Least-Mean-Squares Adaptive Algorithm
    • 5.6 The Affine Projection Algorithm
    • 5.7 The Complex-Valued Case
    • 5.8 Relatives of the LMS
    • 5.9 Simulation Examples
    • 5.10 Adaptive Decision Feedback Equalization
    • 5.11 The Linearly Constrained LMS
    • 5.12 Tracking Performance of the LMS in Nonstationary Environments
    • 5.13 Distributed Learning: The Distributed LMS
    • 5.14 A Case Study: Target Localization
    • 5.15 Some Concluding Remarks: Consensus Matrix
    • Problems
    • MATLAB Exercises
  • Chapter 6: The Least-Squares Family
    • Abstract
    • 6.1 Introduction
    • 6.2 Least-Squares Linear Regression: A Geometric Perspective
    • 6.3 Statistical Properties of the LS Estimator
    • 6.4 Orthogonalizing the Column Space of X: The SVD Method
    • 6.5 Ridge Regression
    • 6.6 The Recursive Least-Squares Algorithm
    • 6.7 Newton’s Iterative Minimization Method
    • 6.8 Steady-State Performance of the RLS
    • 6.9 Complex-Valued Data: The Widely Linear RLS
    • 6.10 Computational Aspects of the LS Solution
    • 6.11 The Coordinate and Cyclic Coordinate Descent Methods
    • 6.12 Simulation Examples
    • 6.13 Total-Least-Squares
    • Problems
  • Chapter 7: Classification: A Tour of the Classics
    • Abstract
    • 7.1 Introduction
    • 7.2 Bayesian Classification
    • 7.3 Decision (Hyper)Surfaces
    • 7.4 The Naive Bayes Classifier
    • 7.5 The Nearest Neighbor Rule
    • 7.6 Logistic Regression
    • 7.7 Fisher’s Linear Discriminant
    • 7.8 Classification Trees
    • 7.9 Combining Classifiers
    • 7.10 The Boosting Approach
    • 7.11 Boosting Trees
    • 7.12 A Case Study: Protein Folding Prediction
    • Problems
  • Chapter 8: Parameter Learning: A Convex Analytic Path
    • Abstract
    • 8.1 Introduction
    • 8.2 Convex Sets and Functions
    • 8.3 Projections onto Convex Sets
    • 8.4 Fundamental Theorem of Projections onto Convex Sets
    • 8.5 A Parallel Version of POCS
    • 8.6 From Convex Sets to Parameter Estimation and Machine Learning
    • 8.7 Infinite Many Closed Convex Sets: The Online Learning Case
    • 8.8 Constrained Learning
    • 8.9 The Distributed APSM
    • 8.10 Optimizing Nonsmooth Convex Cost Functions
    • 8.11 Regret Analysis
    • 8.12 Online Learning and Big Data Applications: A Discussion
    • 8.13 Proximal Operators
    • 8.14 Proximal Splitting Methods for Optimization
    • Problems
    • MATLAB Exercises
    • 8.15 Appendix to Chapter 8
  • Chapter 9: Sparsity-Aware Learning: Concepts and Theoretical Foundations
    • Abstract
    • 9.1 Introduction
    • 9.2 Searching for a Norm
    • 9.3 The Least Absolute Shrinkage and Selection Operator (LASSO)
    • 9.4 Sparse Signal Representation
    • 9.5 In Search of the Sparsest Solution
    • 9.6 Uniqueness of the 0 Minimizer
    • 9.7 Equivalence of 0 and 1 Minimizers: Sufficiency Conditions
    • 9.8 Robust Sparse Signal Recovery from Noisy Measurements
    • 9.9 Compressed Sensing: The Glory of Randomness
    • 9.10 A Case Study: Image De-Noising
    • Problems
  • Chapter 10: Sparsity-Aware Learning: Algorithms and Applications
    • Abstract
    • 10.1 Introduction
    • 10.2 Sparsity-Promoting Algorithms
    • 10.3 Variations on the Sparsity-Aware Theme
    • 10.4 Online Sparsity-Promoting Algorithms
    • 10.5 Learning Sparse Analysis Models
    • 10.6 A Case Study: Time-Frequency Analysis
    • 10.7 Appendix to Chapter 10: Some Hints from the Theory of Frames
    • Problems
  • Chapter 11: Learning in Reproducing Kernel Hilbert Spaces
    • Abstract
    • 11.1 Introduction
    • 11.2 Generalized Linear Models
    • 11.3 Volterra, Wiener, and Hammerstein Models
    • 11.4 Cover’s Theorem: Capacity of a Space in Linear Dichotomies
    • 11.5 Reproducing Kernel Hilbert Spaces
    • 11.6 Representer Theorem
    • 11.7 Kernel Ridge Regression
    • 11.8 Support Vector Regression
    • 11.9 Kernel Ridge Regression Revisited
    • 11.10 Optimal Margin Classification: Support Vector Machines
    • 11.11 Computational Considerations
    • 11.12 Online Learning in RKHS
    • 11.13 Multiple Kernel Learning
    • 11.14 Nonparametric Sparsity-Aware Learning: Additive Models
    • 11.15 A Case Study: Authorship Identification
    • Problems
  • Chapter 12: Bayesian Learning: Inference and the EM Algorithm
    • Abstract
    • 12.1 Introduction
    • 12.2 Regression: A Bayesian Perspective
    • 12.3 The Evidence Function and Occam’s Razor Rule
    • 12.4 Exponential Family of Probability Distributions
    • 12.5 Latent Variables and the EM Algorithm
    • 12.6 Linear Regression and the EM Algorithm
    • 12.7 Gaussian Mixture Models
    • 12.8 Combining Learning Models: A Probabilistic Point of View
    • Problems
    • MATLAB Exercises
    • 12.9 Appendix to Chapter 12
  • Chapter 13: Bayesian Learning: Approximate Inference and Nonparametric Models
    • Abstract
    • 13.1 Introduction
    • 13.2 Variational Approximation in Bayesian Learning
    • 13.3 A Variational Bayesian Approach to Linear Regression
    • 13.4 A Variational Bayesian Approach to Gaussian Mixture Modeling
    • 13.5 When Bayesian Inference Meets Sparsity
    • 13.6 Sparse Bayesian Learning (SBL)
    • 13.7 The Relevance Vector Machine Framework
    • 13.8 Convex Duality and Variational Bounds
    • 13.9 Sparsity-Aware Regression: A Variational Bound Bayesian Path
    • 13.10 Sparsity-Aware Learning: Some Concluding Remarks
    • 13.11 Expectation Propagation
    • 13.12 Nonparametric Bayesian Modeling
    • 13.13 Gaussian Processes
    • 13.14 A Case Study: Hyperspectral Image Unmixing
    • Problems
  • Chapter 14: Monte Carlo Methods
    • Abstract
    • 14.1 Introduction
    • 14.2 Monte Carlo Methods: The Main Concept
    • 14.3 Random Sampling Based on Function Transformation
    • 14.4 Rejection Sampling
    • 14.5 Importance Sampling
    • 14.6 Monte Carlo Methods and the EM Algorithm
    • 14.7 Markov Chain Monte Carlo Methods
    • 14.8 The Metropolis Method
    • 14.9 Gibbs Sampling
    • 14.10 In Search of More Efficient Methods: A Discussion
    • 14.11 A Case Study: Change-Point Detection
    • Problems
  • Chapter 15: Probabilistic Graphical Models: Part I
    • Abstract
    • 15.1 Introduction
    • 15.2 The Need for Graphical Models
    • 15.3 Bayesian Networks and the Markov Condition
    • 15.4 Undirected Graphical Models
    • 15.5 Factor Graphs
    • 15.6 Moralization of Directed Graphs
    • 15.7 Exact Inference Methods: Message-Passing Algorithms
    • Problems
  • Chapter 16: Probabilistic Graphical Models: Part II
    • Abstract
    • 16.1 Introduction
    • 16.2 Triangulated Graphs and Junction Trees
    • 16.3 Approximate Inference Methods
    • 16.4 Dynamic Graphical Models
    • 16.5 Hidden Markov Models
    • 16.6 Beyond HMMs: A Discussion
    • 16.7 Learning Graphical Models
    • Problems
  • Chapter 17: Particle Filtering
    • Abstract
    • 17.1 Introduction
    • 17.2 Sequential Importance Sampling
    • 17.3 Kalman and Particle Filtering
    • 17.4 Particle Filtering
    • Problems
  • Chapter 18: Neural Networks and Deep Learning
    • Abstract
    • 18.1 Introduction
    • 18.2 The Perceptron
    • 18.3 Feed-Forward Multilayer Neural Networks
    • 18.4 The Backpropagation Algorithm
    • 18.5 Pruning the Network
    • 18.6 Universal Approximation Property of Feed-Forward Neural Networks
    • 18.7 Neural Networks: A Bayesian Flavor
    • 18.8 Learning Deep Networks
    • 18.9 Deep Belief Networks
    • 18.10 Variations on the Deep Learning Theme
    • 18.11 Case Study: A Deep Network for Optical Character Recognition
    • 18.12 CASE Study: A Deep Autoencoder
    • 18.13 Example: Generating Data via a DBN
    • Problems
    • MATLAB Exercises
  • Chapter 19: Dimensionality Reduction and Latent Variables Modeling
    • Abstract
    • 19.1 Introduction
    • 19.2 Intrinsic Dimensionality
    • 19.3 Principle Component Analysis
    • 19.4 Canonical Correlation Analysis
    • 19.5 Independent Component Analysis
    • 19.6 Dictionary Learning: The k-SVD Algorithm
    • 19.7 Nonnegative Matrix Factorization
    • 19.8 Learning Low-Dimensional Models: A Probabilistic Perspective
    • 19.9 Nonlinear Dimensionality Reduction
    • 19.10 Low-Rank Matrix Factorization: A Sparse Modeling Path
    • 19.11 A Case Study: fMRI Data Analysis
    • Problems
  • Appendix A: Linear Algebra
    • A.1 Properties of Matrices
    • A.2 Positive Definite and Symmetric Matrices
    • A.3 Wirtinger Calculus
  • Appendix B: Probability Theory and Statistics
    • B.1 Cramér-Rao Bound
    • B.2 Characteristic Functions
    • B.3 Moments and Cumulants
    • B.4 Edgeworth Expansion of a pdf
  • Appendix C: Hints on Constrained Optimization
    • C.1 Equality Constraints
    • C.2 Inequality Constrains
  • Index

Details

No. of pages:
1062
Language:
English
Copyright:
© Academic Press 2015
Published:
Imprint:
Academic Press
eBook ISBN:
9780128017227
Hardcover ISBN:
9780128015223

About the Author

Sergios Theodoridis

Sergios Theodoridis

Sergios Theodoridis is Professor of Signal Processing and Machine Learning in the Department of Informatics and Telecommunications of the University of Athens.

He is the co-author of the bestselling book, Pattern Recognition, and the co-author of Introduction to Pattern Recognition: A MATLAB Approach.

He serves as Editor-in-Chief for the IEEE Transactions on Signal Processing, and he is the co-Editor in Chief with Rama Chellapa for the Academic

Press Library in Signal Processing.

He has received a number of awards including the 2014 IEEE Signal Processing Magazine Best Paper Award, the 2009 IEEE Computational Intelligence Society Transactions on Neural Networks Outstanding Paper Award, the 2014 IEEE Signal Processing Society Education Award, the EURASIP 2014 Meritorious Service Award, and he has served as a Distinguished Lecturer for the IEEE Signal Processing Society and the IEEE Circuits and Systems Society. He is a Fellow of EURASIP and a Fellow of IEEE.

Affiliations and Expertise

Department of Informatics and Telecommunications, University of Athens, Greece

Reviews

"Overall, this text is well organized and full of details suitable for advanced graduate and postgraduate courses, as well as scholars…" --Computing Reviews

"Machine Learning: A Bayesian and Optimization Perspective", Academic Press, 2105, by Sergios Theodoridis is a wonderful book, up to date and rich in detail. It covers a broad selection of topics ranging from classical regression and classification techniques to more recent ones including sparse modeling, convex optimization, Bayesian learning, graphical models and neural networks, giving it a very modern feel and making it highly relevant in the deep learning era. While other widely used machine learning textbooks tend to sacrifice clarity for elegance, Professor Theodoridis provides you with enough detail and insights to understand the "fine print". This makes the book indispensable for the active machine learner." --Prof. Lars Kai Hansen, DTU Compute - Dept. Applied Mathematics and Computer Science Technical University of Denmark

 

"Before the publication of Machine Learning: A Bayesian and Optimization Perspective, I had the opportunity to review one of the chapters in the book (on Monte Carlo methods). I have published actively in this area, and so I was curious how S. Theodoridis would write about it. I was utterly impressed. The chapter presented the material with an optimal mix of theoretical and practical contents in very clear manner and with information for a wide range of readers, from newcomers to more advanced readers. This raised my curiosity to read the rest of the book once it was published. I did it and my original impressions were further reinforced. S. Theodoridis has a great capability to disentangle the important from the unimportant and to make the most of the used space for writing. His text is rich with insights about the addressed topics that are not only helpful for novices but also for seasoned researchers. It goes without saying that my department adopted his book as a textbook in the course on machine learning."
--Petar M. Djurić, Ph.D. SUNY Distinguished Professor Department of Electrical and Computer Engineering Stony Brook University, Stony Brook, USA.

 

"As someone who has taught graduate courses in pattern recognition for over 35 years, I have always looked for a rigorous book that is current and appealing to students with widely varying backgrounds. The book on Machine Learning by Sergios Theodoridis has struck the perfect balance in explaining the key (traditional and new) concepts in machine learning in a way that can be appreciated by undergraduate and graduate students as well as practicing engineers and scientists. The chapters have been written in a self-consistent way, which will help instructors to assemble different sections of the book to suit the background of students" --Rama Cellappa, Distinguished University Professor, Minta Martin Professor of Engineering, Chair, Department of Electrical and Computer Engineering, University of Maryland, USA.