Introduction to Statistical Machine Learning

1st Edition - September 25, 2015
Author: Masashi Sugiyama
Language: English
Paperback ISBN:
9 7 8 - 0 - 1 2 - 8 0 2 1 2 1 - 7
eBook ISBN:
9 7 8 - 0 - 1 2 - 8 0 2 3 5 0 - 1

Machine learning allows computers to learn and discern patterns without actually being programmed. When Statistical techniques and machine learning are combined together they ar… Read more

Introduction to Statistical Machine Learning

Purchase options

LIMITED OFFER

Save 50% on book bundles

Immediately download your ebook while waiting for your print delivery. No promo code is needed.

Institutional subscription on ScienceDirect

Request a sales quote

Machine learning allows computers to learn and discern patterns without actually being programmed. When Statistical techniques and machine learning are combined together they are a powerful tool for analysing various kinds of data in many computer science/engineering areas including, image processing, speech processing, natural language processing, robot control, as well as in fundamental sciences such as biology, medicine, astronomy, physics, and materials.

Introduction to Statistical Machine Learning
provides a general introduction to machine learning that covers a wide range of topics concisely and will help you bridge the gap between theory and practice. Part I discusses the fundamental concepts of statistics and probability that are used in describing machine learning algorithms. Part II and Part III explain the two major approaches of machine learning techniques; generative methods and discriminative methods. While Part III provides an in-depth look at advanced topics that play essential roles in making machine learning algorithms more useful in practice. The accompanying MATLAB/Octave programs provide you with the necessary practical skills needed to accomplish a wide range of data analysis tasks.

Part 1

INTRODUCTION
Chapter 1. Statistical Machine Learning
- 1.1. Types of Learning
- 1.2. Examples of Machine Learning Tasks
- 1.3. Structure of This Textbook

Part 2

STATISTICS AND PROBABILITY
Chapter 2. Random Variables and Probability Distributions
- 2.1. Mathematical Preliminaries
- 2.2. Probability
- 2.3. Random Variable and Probability Distribution
- 2.4. Properties of Probability Distributions
- 2.5. Transformation of Random Variables
Chapter 3. Examples of Discrete Probability Distributions
- 3.1. Discrete Uniform Distribution
- 3.2. Binomial Distribution
- 3.3. Hypergeometric Distribution
- 3.4. Poisson Distribution
- 3.5. Negative Binomial Distribution
- 3.6. Geometric Distribution
Chapter 4. Examples of Continuous Probability Distributions
- 4.1. Continuous Uniform Distribution
- 4.2. Normal Distribution
- 4.3. Gamma Distribution, Exponential Distribution, and Chi-Squared Distribution
- 4.4. Beta Distribution
- 4.5. Cauchy Distribution and Laplace Distribution
- 4.6. t-Distribution and F-Distribution
Chapter 5. Multidimensional Probability Distributions
- 5.1. Joint Probability Distribution
- 5.2. Conditional Probability Distribution
- 5.3. Contingency Table
- 5.4. Bayes’ Theorem
- 5.5. Covariance and Correlation
- 5.6. Independence
Chapter 6. Examples of Multidimensional Probability Distributions
- 6.1. Multinomial Distribution
- 6.2. Multivariate Normal Distribution
- 6.3. Dirichlet Distribution
- 6.4. Wishart Distribution
Chapter 7. Sum of Independent Random Variables
- 7.1. Convolution
- 7.2. Reproductive Property
- 7.3. Law of Large Numbers
- 7.4. Central Limit Theorem
Chapter 8. Probability Inequalities
- 8.1. Union Bound
- 8.2. Inequalities for Probabilities
- 8.3. Inequalities for Expectation
- 8.4. Inequalities for the Sum of Independent Random Variables
Chapter 9. Statistical Estimation
- 9.1. Fundamentals of Statistical Estimation
- 9.2. Point Estimation
- 9.3. Interval Estimation
Chapter 10. Hypothesis Testing
- 10.1. Fundamentals of Hypothesis Testing
- 10.2. Test for Expectation of Normal Samples
- 10.3. Neyman-Pearson Lemma
- 10.4. Test for Contingency Tables
- 10.5. Test for Difference in Expectations of Normal Samples
- 10.6. Nonparametric Test for Ranks
- 10.7. Monte Carlo Test

Part 3

GENERATIVE APPROACH TO STATISTICAL PATTERN RECOGNITION
Chapter 11. Pattern Recognition via Generative Model Estimation
- 11.1. Formulation of Pattern Recognition
- 11.2. Statistical Pattern Recognition
- 11.3. Criteria for Classifier Training
- 11.4. Generative and Discriminative Approaches
Chapter 12. Maximum Likelihood Estimation
- 12.1. Definition
- 12.2. Gaussian Model
- 12.3. Computing the Class-Posterior Probability
- 12.4. Fisher’s Linear Discriminant Analysis (FDA)
- 12.5. Hand-Written Digit Recognition
Chapter 13. Properties of Maximum Likelihood Estimation
- 13.1. Consistency
- 13.2. Asymptotic Unbiasedness
- 13.3. Asymptotic Efficiency
- 13.4. Asymptotic Normality
- 13.5. Summary
Chapter 14. Model Selection for Maximum Likelihood Estimation
- 14.1. Model Selection
- 14.2. KL Divergence
- 14.3. AIC
- 14.4. Cross Validation
- 14.5. Discussion
Chapter 15. Maximum Likelihood Estimation for Gaussian Mixture Model
- 15.1. Gaussian Mixture Model
- 15.2. MLE
- 15.3. Gradient Ascent Algorithm
- 15.4. EM Algorithm
Chapter 16. Nonparametric Estimation
- 16.1. Histogram Method
- 16.2. Problem Formulation
- 16.3. KDE
- 16.4. NNDE
Chapter 17. Bayesian Inference
- 17.1. Bayesian Predictive Distribution
- 17.2. Conjugate Prior
- 17.3. MAP Estimation
- 17.4. Bayesian Model Selection
Chapter 18. Analytic Approximation of Marginal Likelihood
- 18.1. Laplace Approximation
- 18.2. Variational Approximation
Chapter 19. Numerical Approximation of Predictive Distribution
- 19.1. Monte Carlo Integration
- 19.2. Importance Sampling
- 19.3. Sampling Algorithms
Chapter 20. Bayesian Mixture Models
- 20.1. Gaussian Mixture Models
- 20.2. Latent Dirichlet Allocation (LDA)

Part 4

DISCRIMINATIVE APPROACH TO STATISTICAL MACHINE LEARNING
Chapter 21. Learning Models
- 21.1. Linear-in-Parameter Model
- 21.2. Kernel Model
- 21.3. Hierarchical Model
Chapter 22. Least Squares Regression
- 22.1. Method of LS
- 22.2. Solution for Linear-in-Parameter Model
- 22.3. Properties of LS Solution
- 22.4. Learning Algorithm for Large-Scale Data
- 22.5. Learning Algorithm for Hierarchical Model
Chapter 23. Constrained LS Regression
- 23.1. Subspace-Constrained LS
- 23.2. ℓ2-Constrained LS
- 23.3. Model Selection
Chapter 24. Sparse Regression
- 24.1. ℓ1-Constrained LS
- 24.2. Solving ℓ1-Constrained LS
- 24.3. Feature Selection by Sparse Learning
- 24.4. Various Extensions
Chapter 25. Robust Regression
- 25.1. Nonrobustness of ℓ2-Loss Minimization
- 25.2. ℓ1-Loss Minimization
- 25.3. Huber Loss Minimization
- 25.4. Tukey Loss Minimization
Chapter 26. Least Squares Classification
- 26.1. Classification by LS Regression
- 26.2. 0∕1-Loss and Margin
- 26.3. Multiclass Classification
Chapter 27. Support Vector Classification
- 27.1. Maximum Margin Classification
- 27.2. Dual Optimization of Support Vector Classification
- 27.3. Sparseness of Dual Solution
- 27.4. Nonlinearization by Kernel Trick
- 27.5. Multiclass Extension
- 27.6. Loss Minimization View
Chapter 28. Probabilistic Classification
- 28.1. Logistic Regression
- 28.2. LS Probabilistic Classification
Chapter 29. Structured Classification
- 29.1. Sequence Classification
- 29.2. Probabilistic Classification for Sequences
- 29.3. Deterministic Classification for Sequences

Part 5

FURTHER TOPICS
Chapter 30. Ensemble Learning
- 30.1. Decision Stump Classifier
- 30.2. Bagging
- 30.3. Boosting
- 30.4. General Ensemble Learning
Chapter 31. Online Learning
- 31.1. Stochastic Gradient Descent
- 31.2. Passive-Aggressive Learning
- 31.3. Adaptive Regularization of Weight Vectors (AROW)
Chapter 32. Confidence of Prediction
- 32.1. Predictive Variance for ℓ2-Regularized LS
- 32.2. Bootstrap Confidence Estimation
- 32.3. Applications
Chapter 33. Semisupervised Learning
- 33.1. Manifold Regularization
- 33.2. Covariate Shift Adaptation
- 33.3. Class-balance Change Adaptation
Chapter 34. Multitask Learning
- 34.1. Task Similarity Regularization
- 34.2. Multidimensional Function Learning
- 34.3. Matrix Regularization
Chapter 35. Linear Dimensionality Reduction
- 35.1. Curse of Dimensionality
- 35.2. Unsupervised Dimensionality Reduction
- 35.3. Linear Discriminant Analyses for Classification
- 35.4. Sufficient Dimensionality Reduction for Regression
- 35.5. Matrix Imputation
Chapter 36. Nonlinear Dimensionality Reduction
- 36.1. Dimensionality Reduction with Kernel Trick
- 36.2. Supervised Dimensionality Reduction with Neural Networks
- 36.3. Unsupervised Dimensionality Reduction with Autoencoder
- 36.4. Unsupervised Dimensionality Reduction with Restricted Boltzmann Machine
- 36.5. Deep Learning
Chapter 37. Clustering
- 37.1. k-Means Clustering
- 37.2. Kernel k-Means Clustering
- 37.3. Spectral Clustering
- 37.4. Tuning Parameter Selection
Chapter 38. Outlier Detection
- 38.1. Density Estimation and Local Outlier Factor
- 38.2. Support Vector Data Description
- 38.3. Inlier-Based Outlier Detection
Chapter 39. Change Detection

39.1. Distributional Change Detection
39.2. Structural Change Detection

Purchase options

Save 50% on book bundles

Institutional subscription on ScienceDirect

Masashi Sugiyama