Data Mining

Practical Machine Learning Tools and Techniques

4th Edition - October 1, 2016
Authors: Ian H. Witten, Eibe Frank, Mark A. Hall, Christopher J. Pal
Language: English
Paperback ISBN:
9 7 8 - 0 - 1 2 - 8 0 4 2 9 1 - 5
eBook ISBN:
9 7 8 - 0 - 1 2 - 8 0 4 3 5 7 - 8

Data Mining: Practical Machine Learning Tools and Techniques, Fourth Edition, offers a thorough grounding in machine learning concepts, along with practical advice on applying… Read more

Purchase options

LIMITED OFFER

Save 50% on book bundles

Immediately download your ebook while waiting for your print delivery. No promo code is needed.

Institutional subscription on ScienceDirect

Request a sales quote

Resources

Textbook support for instructors(opens in new tab/window)

Data Mining: Practical Machine Learning Tools and Techniques, Fourth Edition, offers a thorough grounding in machine learning concepts, along with practical advice on applying these tools and techniques in real-world data mining situations. This highly anticipated fourth edition of the most acclaimed work on data mining and machine learning teaches readers everything they need to know to get going, from preparing inputs, interpreting outputs, evaluating results, to the algorithmic methods at the heart of successful data mining approaches.

Extensive updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including substantial new chapters on probabilistic methods and on deep learning. Accompanying the book is a new version of the popular WEKA machine learning software from the University of Waikato. Authors Witten, Frank, Hall, and Pal include today's techniques coupled with the methods at the leading edge of contemporary research.

Please visit the book companion website at https://www.cs.waikato.ac.nz/~ml/weka/book.html.

It contains

Powerpoint slides for Chapters 1-12. This is a very comprehensive teaching resource, with many PPT slides covering each chapter of the book

Online Appendix on the Weka workbench; again a very comprehensive learning aid for the open source software that goes with the book

Table of contents, highlighting the many new sections in the 4th edition, along with reviews of the 1st edition, errata, etc.

Part I: Introduction to data mining

Chapter 1. What’s it all about?

Abstract
1.1 Data Mining and Machine Learning
1.2 Simple Examples: The Weather Problem and Others
1.3 Fielded Applications
1.4 The Data Mining Process
1.5 Machine Learning and Statistics
1.6 Generalization as Search
1.7 Data Mining and Ethics
1.8 Further Reading and Bibliographic Notes

Chapter 2. Input: Concepts, instances, attributes

Abstract
2.1 What’s a Concept?
2.2 What’s in an Example?
2.3 What’s in an Attribute?
2.4 Preparing the Input
2.5 Further Reading and Bibliographic Notes

Chapter 3. Output: Knowledge representation

Abstract
3.1 Tables
3.2 Linear Models
3.3 Trees
3.4 Rules
3.5 Instance-Based Representation
3.6 Clusters
3.7 Further Reading and Bibliographic Notes

Chapter 4. Algorithms: The basic methods

Abstracts
4.1 Inferring Rudimentary Rules
4.2 Simple Probabilistic Modeling
4.3 Divide-and-Conquer: Constructing Decision Trees
4.4 Covering Algorithms: Constructing Rules
4.5 Mining Association Rules
4.6 Linear Models
4.7 Instance-Based Learning
4.8 Clustering
4.9 Multi-instance Learning
4.10 Further Reading and Bibliographic Notes
4.11 Weka Implementations

Chapter 5. Credibility: Evaluating what’s been learned

Abstract
5.1 Training and Testing
5.2 Predicting Performance
5.3 Cross-Validation
5.4 Other Estimates
5.5 Hyperparameter Selection
5.6 Comparing Data Mining Schemes
5.7 Predicting Probabilities
5.8 Counting the Cost
5.9 Evaluating Numeric Prediction
5.10 The MDL Principle
5.11 Applying the MDL Principle to Clustering
5.12 Using a Validation Set for Model Selection
5.13 Further Reading and Bibliographic Notes

Part II: More advanced machine learning schemes

Chapter 6. Trees and rules

Abstract
6.1 Decision Trees
6.2 Classification Rules
6.3 Association Rules
6.4 Weka Implementations

Chapter 7. Extending instance-based and linear models

Abstract
7.1 Instance-Based Learning
7.2 Extending Linear Models
7.3 Numeric Prediction With Local Linear Models
7.4 Weka Implementations

Chapter 8. Data transformations

Abstracts
8.1 Attribute Selection
8.2 Discretizing Numeric Attributes
8.3 Projections
8.4 Sampling
8.5 Cleansing
8.6 Transforming Multiple Classes to Binary Ones
8.7 Calibrating Class Probabilities
8.8 Further Reading and Bibliographic Notes
8.9 Weka Implementations

Chapter 9. Probabilistic methods

Abstract
9.1 Foundations
9.2 Bayesian Networks
9.3 Clustering and Probability Density Estimation
9.4 Hidden Variable Models
9.5 Bayesian Estimation and Prediction
9.6 Graphical Models and Factor Graphs
9.7 Conditional Probability Models
9.8 Sequential and Temporal Models
9.9 Further Reading and Bibliographic Notes
9.10 Weka Implementations

Chapter 10. Deep learning

Abstract
10.1 Deep Feedforward Networks
10.2 Training and Evaluating Deep Networks
10.3 Convolutional Neural Networks
10.4 Autoencoders
10.5 Stochastic Deep Networks
10.6 Recurrent Neural Networks
10.7 Further Reading and Bibliographic Notes
10.8 Deep Learning Software and Network Implementations
10.9 WEKA Implementations

Chapter 11. Beyond supervised and unsupervised learning

Abstract
11.1 Semisupervised Learning
11.2 Multi-instance Learning
11.3 Further Reading and Bibliographic Notes
11.4 WEKA Implementations

Chapter 12. Ensemble learning

Abstract
12.1 Combining Multiple Models
12.2 Bagging
12.3 Randomization
12.4 Boosting
12.5 Additive Regression
12.6 Interpretable Ensembles
12.7 Stacking
12.8 Further Reading and Bibliographic Notes
12.9 WEKA Implementations

Chapter 13. Moving on: applications and beyond

Abstract
13.1 Applying Machine Learning
13.2 Learning From Massive Datasets
13.3 Data Stream Learning
13.4 Incorporating Domain Knowledge
13.5 Text Mining
13.6 Web Mining
13.7 Images and Speech
13.8 Adversarial Situations
13.9 Ubiquitous Data Mining
13.10 Further Reading and Bibliographic Notes
13.11 WEKA Implementations

Appendix A. Theoretical foundations

A.1 Matrix Algebra
A.2 Fundamental Elements of Probabilistic Methods

Appendix B. The WEKA workbench

B.1 What’s in WEKA?
B.2 The package management system
B.3 The Explorer
B.4 The Knowledge Flow Interface
B.5 The Experimenter

Ian H. Witten

Ian H. Witten is a professor of computer science at the University of Waikato in New Zealand. He directs the New Zealand Digital Library research project. His research interests include information retrieval, machine learning, text compression, and programming by demonstration. He received an MA in Mathematics from Cambridge University, England; an MSc in Computer Science from the University of Calgary, Canada; and a PhD in Electrical Engineering from Essex University, England. He is a fellow of the ACM and of the Royal Society of New Zealand. He has published widely on digital libraries, machine learning, text compression, hypertext, speech synthesis and signal processing, and computer typography.

Affiliations and expertise

Computer Science Department, University of Waikato, New Zealand

Eibe Frank

Eibe Frank lives in New Zealand with his Samoan spouse and two lovely boys, but originally hails from Germany, where he received his first degree in computer science from the University of Karlsruhe. He moved to New Zealand to pursue his Ph.D. in machine learning under the supervision of Ian H. Witten and joined the Department of Computer Science at the University of Waikato as a lecturer on completion of his studies. He is now an associate professor at the same institution. As an early adopter of the Java programming language, he laid the groundwork for the Weka software described in this book. He has contributed a number of publications on machine learning and data mining to the literature and has refereed for many conferences and journals in these areas.

Affiliations and expertise

Computer Science Department, University of Waikato, New Zealand.

Mark A. Hall

Mark A. Hall holds a bachelor’s degree in computing and mathematical sciences and a Ph.D. in computer science, both from the University of Waikato. Throughout his time at Waikato, as a student and lecturer in computer science and more recently as a software developer and data mining consultant for Pentaho, an open-source business intelligence software company, Mark has been a core contributor to the Weka software described in this book. He has published several articles on machine learning and data mining and has refereed for conferences and journals in these areas.

Affiliations and expertise

Computer Science Department, University of Waikato, New Zealand.

Christopher J. Pal

Christopher J. Pal is a Canada CIFAR AI Chair and a full professor at the Department of Computer Engineering and Software Engineering at Polytechnique Montréal. Pal’s research interests include computer vision and pattern recognition, computational photography, natural language processing, statistical machine learning and applications to human computer interaction.

Affiliations and expertise

Department of Computer Engineering and Software Engineering, Polytechnique Montréal, Quebec, Canada.