Data Mining

Practical Machine Learning Tools and Techniques

3rd Edition - January 6, 2011
Authors: Ian H. Witten, Eibe Frank, Mark A. Hall
Language: English
eBook ISBN:
9 7 8 - 0 - 0 8 - 0 8 9 0 3 6 - 4

Data Mining: Practical Machine Learning Tools and Techniques, Third Edition, offers a thorough grounding in machine learning concepts as well as practical advice on applying… Read more

Purchase options

LIMITED OFFER

Save 50% on book bundles

Immediately download your ebook while waiting for your print delivery. No promo code is needed.

Institutional subscription on ScienceDirect

Request a sales quote

Resources

Textbook support for instructors(opens in new tab/window)

Data Mining: Practical Machine Learning Tools and Techniques, Third Edition, offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.

Thorough updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including new material on Data Transformations, Ensemble Learning, Massive Data Sets, Multi-instance Learning, plus a new version of the popular Weka machine learning software developed by the authors. Witten, Frank, and Hall include both tried-and-true techniques of today as well as methods at the leading edge of contemporary research.

The book is targeted at information systems practitioners, programmers, consultants, developers, information technology managers, specification writers, data analysts, data modelers, database R&D professionals, data warehouse engineers, data mining professionals. The book will also be useful for professors and students of upper-level undergraduate and graduate-level data mining and machine learning courses who want to incorporate data mining as part of their data management knowledge base and expertise.

LIST OF FIGURES
LIST OF TABLES
PREFACE
ACKNOWLEDGMENTS
ABOUT THE AUTHORS
PART I. Introduction to Data Mining

CHAPTER 1. What’s It All About?

1.1. Data mining and machine learning
1.2. Simple examples: the weather and other problems
1.3. Fielded applications
1.4. Machine learning and statistics
1.5. Generalization as search
1.6. Data mining and ethics
1.7. Further reading

CHAPTER 2. Input

2.1. What’s a concept?
2.2. What’s in an example?
2.3. What’s in an attribute?
2.4. Preparing the input
2.5. Further reading

CHAPTER 3. Output

3.1. Tables
3.2. Linear models
3.3. Trees
3.4. Rules
3.5. Instance-based representation
3.6. Clusters
3.7. Further Reading

CHAPTER 4. Algorithms

4.1. InFerring rudimentary rules
4.2. Statistical modeling
4.3. Divide-and-conquer: constructing decision trees
4.4. Covering algorithms: constructing rules
4.5. Mining association rules
4.6. Linear models
4.7. Instance-based learning
4.8. Clustering
4.9. Multi-instance learning
4.10. Further reading
4.11. Weka implementations

CHAPTER 5. Credibility

5.1. Training and testing
5.2. Predicting performance
5.3. Cross-validation
5.4. Other estimates
5.5. Comparing data mining schemes
5.6. Predicting probabilities
5.7. Counting the cost
5.8. Evaluating numeric prediction
5.9. Minimum description length principle
5.10. Applying the MDL principle to clustering
5.11. Further reading

PART II. Advanced Data Mining

CHAPTER 6. Implementations

6.1. Decision trees
6.2. Classification rules
6.3. Association rules
6.4. Extending linear models
6.5. Instance-based learning
6.6. Numeric prediction with local linear models
6.7. Bayesian networks
6.8. Clustering
6.9. Semisupervised learning
6.10. Multi-instance learning
6.11. Weka implementations

CHAPTER 7. Data Transformations

7.1. Attribute selection
7.2. Discretizing numeric attributes
7.3. Projections
7.4. Sampling
7.5. Cleansing
7.6. Transforming multiple classes to binary ones
7.7. Calibrating class probabilities
7.8. Further reading
7.9. Weka implementations

CHAPTER 8. Ensemble Learning

8.1. Combining multiple models
8.2. Bagging
8.3. Randomization
8.4. Boosting
8.5. Additive regression
8.6. Interpretable ensembles
8.7. Stacking
8.8. Further reading
8.9. Weka implementations

CHAPTER 9. Moving on

9.1. Applying data mining
9.2. Learning from massive datasets
9.3. Data stream learning
9.4. Incorporating domain knowledge
9.5. Text mining
9.6. Web mining
9.7. Adversarial situations
9.8. Ubiquitous data mining
9.9. Further reading

PART III. The Weka Data Mining Workbench

CHAPTER 10. Introduction to Weka

10.1. What’s in weka?
10.2. How do you use it?
10.3. What else can you do?
10.4. How do you get it?

CHAPTER 11. The Explorer

11.1. Getting started
11.2. Exploring the explorer
11.3. Filtering algorithms
11.4. Learning algorithms
11.5. Metalearning algorithms
11.6. Clustering algorithms
11.7. Association-rule learners
11.8. Attribute selection

CHAPTER 12. The Knowledge Flow Interface

12.1. Getting started
12.2. Components
12.3. Configuring and connecting the components
12.4. Incremental learning

CHAPTER 13. The Experimenter

13.1. Getting started
13.2. Simple setup
13.3. Advanced setup
13.4. The analyze panel
13.5. Distributing processing over several machines

CHAPTER 14. The Command-Line Interface

14.1. Getting started
14.2. The structure of weka
14.3. Command-line options

CHAPTER 15. Embedded Machine Learning

15.1. A simple data mining application

CHAPTER 16. Writing New Learning Schemes

16.1. An example classifier
16.2. Conventions for implementing classifiers

CHAPTER 17. Tutorial Exercises for the Weka Explorer

17.1. Introduction to the explorer interface
17.2. Nearest-neighbor learning and decision trees
17.3. Classification boundaries
17.4. Preprocessing and parameter tuning
17.5. Document classification
17.6. Mining association rules

Index

Ian H. Witten

Ian H. Witten is a professor of computer science at the University of Waikato in New Zealand. He directs the New Zealand Digital Library research project. His research interests include information retrieval, machine learning, text compression, and programming by demonstration. He received an MA in Mathematics from Cambridge University, England; an MSc in Computer Science from the University of Calgary, Canada; and a PhD in Electrical Engineering from Essex University, England. He is a fellow of the ACM and of the Royal Society of New Zealand. He has published widely on digital libraries, machine learning, text compression, hypertext, speech synthesis and signal processing, and computer typography.

Affiliations and expertise

Computer Science Department, University of Waikato, New Zealand

Eibe Frank

Eibe Frank lives in New Zealand with his Samoan spouse and two lovely boys, but originally hails from Germany, where he received his first degree in computer science from the University of Karlsruhe. He moved to New Zealand to pursue his Ph.D. in machine learning under the supervision of Ian H. Witten and joined the Department of Computer Science at the University of Waikato as a lecturer on completion of his studies. He is now an associate professor at the same institution. As an early adopter of the Java programming language, he laid the groundwork for the Weka software described in this book. He has contributed a number of publications on machine learning and data mining to the literature and has refereed for many conferences and journals in these areas.

Affiliations and expertise

Computer Science Department, University of Waikato, New Zealand.

Mark A. Hall

Mark A. Hall holds a bachelor’s degree in computing and mathematical sciences and a Ph.D. in computer science, both from the University of Waikato. Throughout his time at Waikato, as a student and lecturer in computer science and more recently as a software developer and data mining consultant for Pentaho, an open-source business intelligence software company, Mark has been a core contributor to the Weka software described in this book. He has published several articles on machine learning and data mining and has refereed for conferences and journals in these areas.

Affiliations and expertise

Computer Science Department, University of Waikato, New Zealand.