# Introduction to Statistical Machine Learning

## 1st Edition

**Authors:**Masashi Sugiyama

**Paperback ISBN:**9780128021217

**eBook ISBN:**9780128023501

**Imprint:**Morgan Kaufmann

**Published Date:**25th September 2015

**Page Count:**534

## Description

Machine learning allows computers to learn and discern patterns without actually being programmed. When Statistical techniques and machine learning are combined together they are a powerful tool for analysing various kinds of data in many computer science/engineering areas including, image processing, speech processing, natural language processing, robot control, as well as in fundamental sciences such as biology, medicine, astronomy, physics, and materials.

*Introduction to Statistical Machine Learning *provides a* *general introduction to machine learning that covers a wide range of topics concisely and will help you bridge the gap between theory and practice. Part I discusses the fundamental concepts of statistics and probability that are used in describing machine learning algorithms. Part II and Part III explain the two major approaches of machine learning techniques; generative methods and discriminative methods. While Part III provides an in-depth look at advanced topics that play essential roles in making machine learning algorithms more useful in practice. The accompanying MATLAB/Octave programs provide you with the necessary practical skills needed to accomplish a wide range of data analysis tasks.

## Key Features

- Provides the necessary background material to understand machine learning such as statistics, probability, linear algebra, and calculus
- Complete coverage of the generative approach to statistical pattern recognition and the discriminative approach to statistical machine learning
- Includes MATLAB/Octave programs so that readers can test the algorithms numerically and acquire both mathematical and practical skills in a wide range of data analysis tasks
- Discusses a wide range of applications in machine learning and statistics and provides examples drawn from image processing, speech processing, natural language processing, robot control, as well as biology, medicine, astronomy, physics, and materials

## Readership

Data scientists and graduate students and those interested in statistical machine learning

## Table of Contents

Part 1

- INTRODUCTION
- Chapter 1. Statistical Machine Learning
- 1.1. Types of Learning
- 1.2. Examples of Machine Learning Tasks
- 1.3. Structure of This Textbook

Part 2

- STATISTICS AND PROBABILITY
- Chapter 2. Random Variables and Probability Distributions
- 2.1. Mathematical Preliminaries
- 2.2. Probability
- 2.3. Random Variable and Probability Distribution
- 2.4. Properties of Probability Distributions
- 2.5. Transformation of Random Variables

- Chapter 3. Examples of Discrete Probability Distributions
- 3.1. Discrete Uniform Distribution
- 3.2. Binomial Distribution
- 3.3. Hypergeometric Distribution
- 3.4. Poisson Distribution
- 3.5. Negative Binomial Distribution
- 3.6. Geometric Distribution

- Chapter 4. Examples of Continuous Probability Distributions
- 4.1. Continuous Uniform Distribution
- 4.2. Normal Distribution
- 4.3. Gamma Distribution, Exponential Distribution, and Chi-Squared Distribution
- 4.4. Beta Distribution
- 4.5. Cauchy Distribution and Laplace Distribution
- 4.6. t-Distribution and F-Distribution

- Chapter 5. Multidimensional Probability Distributions
- 5.1. Joint Probability Distribution
- 5.2. Conditional Probability Distribution
- 5.3. Contingency Table
- 5.4. Bayes’ Theorem
- 5.5. Covariance and Correlation
- 5.6. Independence

- Chapter 6. Examples of Multidimensional Probability Distributions
- 6.1. Multinomial Distribution
- 6.2. Multivariate Normal Distribution
- 6.3. Dirichlet Distribution
- 6.4. Wishart Distribution

- Chapter 7. Sum of Independent Random Variables
- 7.1. Convolution
- 7.2. Reproductive Property
- 7.3. Law of Large Numbers
- 7.4. Central Limit Theorem

- Chapter 8. Probability Inequalities
- 8.1. Union Bound
- 8.2. Inequalities for Probabilities
- 8.3. Inequalities for Expectation
- 8.4. Inequalities for the Sum of Independent Random Variables

- Chapter 9. Statistical Estimation
- 9.1. Fundamentals of Statistical Estimation
- 9.2. Point Estimation
- 9.3. Interval Estimation

- Chapter 10. Hypothesis Testing
- 10.1. Fundamentals of Hypothesis Testing
- 10.2. Test for Expectation of Normal Samples
- 10.3. Neyman-Pearson Lemma
- 10.4. Test for Contingency Tables
- 10.5. Test for Difference in Expectations of Normal Samples
- 10.6. Nonparametric Test for Ranks
- 10.7. Monte Carlo Test

Part 3

- GENERATIVE APPROACH TO STATISTICAL PATTERN RECOGNITION
- Chapter 11. Pattern Recognition via Generative Model Estimation
- 11.1. Formulation of Pattern Recognition
- 11.2. Statistical Pattern Recognition
- 11.3. Criteria for Classifier Training
- 11.4. Generative and Discriminative Approaches

- Chapter 12. Maximum Likelihood Estimation
- 12.1. Definition
- 12.2. Gaussian Model
- 12.3. Computing the Class-Posterior Probability
- 12.4. Fisher’s Linear Discriminant Analysis (FDA)
- 12.5. Hand-Written Digit Recognition

- Chapter 13. Properties of Maximum Likelihood Estimation
- 13.1. Consistency
- 13.2. Asymptotic Unbiasedness
- 13.3. Asymptotic Efficiency
- 13.4. Asymptotic Normality
- 13.5. Summary

- Chapter 14. Model Selection for Maximum Likelihood Estimation
- 14.1. Model Selection
- 14.2. KL Divergence
- 14.3. AIC
- 14.4. Cross Validation
- 14.5. Discussion

- Chapter 15. Maximum Likelihood Estimation for Gaussian Mixture Model
- 15.1. Gaussian Mixture Model
- 15.2. MLE
- 15.3. Gradient Ascent Algorithm
- 15.4. EM Algorithm

- Chapter 16. Nonparametric Estimation
- 16.1. Histogram Method
- 16.2. Problem Formulation
- 16.3. KDE
- 16.4. NNDE

- Chapter 17. Bayesian Inference
- 17.1. Bayesian Predictive Distribution
- 17.2. Conjugate Prior
- 17.3. MAP Estimation
- 17.4. Bayesian Model Selection

- Chapter 18. Analytic Approximation of Marginal Likelihood
- 18.1. Laplace Approximation
- 18.2. Variational Approximation

- Chapter 19. Numerical Approximation of Predictive Distribution
- 19.1. Monte Carlo Integration
- 19.2. Importance Sampling
- 19.3. Sampling Algorithms

- Chapter 20. Bayesian Mixture Models
- 20.1. Gaussian Mixture Models
- 20.2. Latent Dirichlet Allocation (LDA)

Part 4

- DISCRIMINATIVE APPROACH TO STATISTICAL MACHINE LEARNING
- Chapter 21. Learning Models
- 21.1. Linear-in-Parameter Model
- 21.2. Kernel Model
- 21.3. Hierarchical Model

- Chapter 22. Least Squares Regression
- 22.1. Method of LS
- 22.2. Solution for Linear-in-Parameter Model
- 22.3. Properties of LS Solution
- 22.4. Learning Algorithm for Large-Scale Data
- 22.5. Learning Algorithm for Hierarchical Model

- Chapter 23. Constrained LS Regression
- 23.1. Subspace-Constrained LS
- 23.2. ℓ2-Constrained LS
- 23.3. Model Selection

- Chapter 24. Sparse Regression
- 24.1. ℓ1-Constrained LS
- 24.2. Solving ℓ1-Constrained LS
- 24.3. Feature Selection by Sparse Learning
- 24.4. Various Extensions

- Chapter 25. Robust Regression
- 25.1. Nonrobustness of ℓ2-Loss Minimization
- 25.2. ℓ1-Loss Minimization
- 25.3. Huber Loss Minimization
- 25.4. Tukey Loss Minimization

- Chapter 26. Least Squares Classification
- 26.1. Classification by LS Regression
- 26.2. 0∕1-Loss and Margin
- 26.3. Multiclass Classification

- Chapter 27. Support Vector Classification
- 27.1. Maximum Margin Classification
- 27.2. Dual Optimization of Support Vector Classification
- 27.3. Sparseness of Dual Solution
- 27.4. Nonlinearization by Kernel Trick
- 27.5. Multiclass Extension
- 27.6. Loss Minimization View

- Chapter 28. Probabilistic Classification
- 28.1. Logistic Regression
- 28.2. LS Probabilistic Classification

- Chapter 29. Structured Classification
- 29.1. Sequence Classification
- 29.2. Probabilistic Classification for Sequences
- 29.3. Deterministic Classification for Sequences

Part 5

- FURTHER TOPICS
- Chapter 30. Ensemble Learning
- 30.1. Decision Stump Classifier
- 30.2. Bagging
- 30.3. Boosting
- 30.4. General Ensemble Learning

- Chapter 31. Online Learning
- 31.1. Stochastic Gradient Descent
- 31.2. Passive-Aggressive Learning
- 31.3. Adaptive Regularization of Weight Vectors (AROW)

- Chapter 32. Confidence of Prediction
- 32.1. Predictive Variance for ℓ2-Regularized LS
- 32.2. Bootstrap Confidence Estimation
- 32.3. Applications

- Chapter 33. Semisupervised Learning
- 33.1. Manifold Regularization
- 33.2. Covariate Shift Adaptation
- 33.3. Class-balance Change Adaptation

- Chapter 34. Multitask Learning
- 34.1. Task Similarity Regularization
- 34.2. Multidimensional Function Learning
- 34.3. Matrix Regularization

- Chapter 35. Linear Dimensionality Reduction
- 35.1. Curse of Dimensionality
- 35.2. Unsupervised Dimensionality Reduction
- 35.3. Linear Discriminant Analyses for Classification
- 35.4. Sufficient Dimensionality Reduction for Regression
- 35.5. Matrix Imputation

- Chapter 36. Nonlinear Dimensionality Reduction
- 36.1. Dimensionality Reduction with Kernel Trick
- 36.2. Supervised Dimensionality Reduction with Neural Networks
- 36.3. Unsupervised Dimensionality Reduction with Autoencoder
- 36.4. Unsupervised Dimensionality Reduction with Restricted Boltzmann Machine
- 36.5. Deep Learning

- Chapter 37. Clustering
- 37.1. k-Means Clustering
- 37.2. Kernel k-Means Clustering
- 37.3. Spectral Clustering
- 37.4. Tuning Parameter Selection

- Chapter 38. Outlier Detection
- 38.1. Density Estimation and Local Outlier Factor
- 38.2. Support Vector Data Description
- 38.3. Inlier-Based Outlier Detection

- Chapter 39. Change Detection
- 39.1. Distributional Change Detection
- 39.2. Structural Change Detection

## Details

- No. of pages:
- 534

- Language:
- English

- Copyright:
- © Morgan Kaufmann 2016

- Published:
- 25th September 2015

- Imprint:
- Morgan Kaufmann

- Paperback ISBN:
- 9780128021217

- eBook ISBN:
- 9780128023501

## About the Author

### Masashi Sugiyama

Masashi Sugiyama received the degrees of Bachelor of Engineering, Master of Engineering, and Doctor of Engineering in Computer Science from Tokyo Institute of Technology, Japan in 1997, 1999, and 2001, respectively. In 2001, he was appointed Assistant Professor in the same institute, and he was promoted to Associate Professor in 2003. He moved to the University of Tokyo as Professor in 2014. He received an Alexander von Humboldt Foundation Research Fellowship and researched at Fraunhofer Institute, Berlin, Germany, from 2003 to 2004. In 2006, he received a European Commission Program Erasmus Mundus Scholarship and researched at the University of Edinburgh, Edinburgh, UK. He received the Faculty Award from IBM in 2007 for his contribution to machine learning under non-stationarity, the Nagao Special Researcher Award from the Information Processing Society of Japan in 2011 and the Young Scientists' Prize from the Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology Japan for his contribution to the density-ratio paradigm of machine learning. His research interests include theories and algorithms of machine learning and data mining, and a wide range of applications such as signal processing, image processing, and robot control.

### Affiliations and Expertise

Professor, The University of Tokyo, Japan

## Reviews

"The probabilistic and statistical background is well presented, providing the reader with a complete coverage of the generative approach to statistical pattern recognition and the discriminative approach to statistical machine learning." --**Zentralblatt MATH**