Preface1. What’s it all about? 1.1 Data mining and machine learning 1.2 Simple examples: the weather problem and others 1.3 Fielded applications 1.4 Machine learning and statistics 1.5 Generalization as search 1.6 Data mining and ethics 1.7 Further reading 2. Input: Concepts, instances, attributes 2.1 What’s a concept? 2.2 What’s in an example? 2.3 What’s in an attribute? 2.4 Preparing the input 2.5 Further reading 3. Output: Knowledge representation 3.1 Decision tables 3.2 Decision trees 3.3 Classification rules 3.4 Association rules 3.5 Rules with exceptions 3.6 Rules involving relations 3.7 Trees for numeric prediction 3.8 Instance-based representation 3.9 Clusters 3.10 Further reading 4. Algorithms: The basic methods 4.1 Inferring rudimentary rules 4.2 Statistical modeling 4.3 Divide-and-conquer: constructing decision trees 4.4 Covering algorithms: constructing rules 4.5 Mining association rules 4.6 Linear models 4.7 Instance-based learning 4.8 Clustering 4.9 Further reading 5. Credibility: Evaluating what’s been learned 5.1 Training and testing 5.2 Predicting performance 5.3 Cross-validation 5.4 Other estimates 5.5 Comparing data mining schemes 5.6 Predicting probabilities 5.7 Counting the cost 5.8 Evaluating numeric prediction 5.9 The minimum description length principle 5.10 Applying MDL to clustering 5.11 Further reading 6. Implementations: Real machine learning schemes 6.1 Decision trees 6.2 Classification rules 6.3 Extending linear models 6.4 Instance-based learning 6.5 Numeric prediction 6.6 Clustering 6.7 Bayesian networks 7. Transformations: Engineering the input and output 7.1 Attribute selection 7.2 Discretizing numeric attributes 7.3 Some useful transformations 7.4 Automatic data cleansing 7.5 Combining multiple models 7.6 Using unlabeled data 7.7 Further reading 8. Moving on: Extensions and applications 8.1 Learning from massive datasets 8.2 Incorporating domain knowledge 8.3 Text and Web mining 8.4 Adversarial situations 8.5 Ubiquitous data mining 8.6 Further reading Part II: The Weka machine learning workbench 9. Introduction to Weka 9.1 What’s in Weka? 9.2 How do you use it? 9.3 What else can you do? 10. The Explorer 10.1 Getting started 10.2 Exploring the Explorer 10.3 Filtering algorithms 10.4 Learning algorithms 10.5 Meta-learning algorithms 10.6 Clustering algorithms 10.7 Association-rule learners 10.8 Attribute selection 11. The Knowledge Flow interface 11.1 Getting started 11.2 Knowledge Flow components 11.3 Configuring and connecting the components 11.4 Incremental learning 12. The Experimenter 12.1 Getting started 12.2 Simple setup 12.3 Advanced setup 12.4 The Analyze panel 12.5 Distributing processing over several machines 13. The command-line interface 13.1 Getting started 13.2 The structure of Weka 13.3 Command-line options 14. Embedded machine learning 15. Writing new learning schemes
Data Mining, Second Edition, describes data mining techniques and shows how they work. The book is a major revision of the first edition that appeared in 1999. While the basic core remains the same, it has been updated to reflect the changes that have taken place over five years, and now has nearly double the references.
The highlights of this new edition include thirty new technique sections; an enhanced Weka machine learning workbench, which now features an interactive interface; comprehensive information on neural networks; a new section on Bayesian networks; and much more.
This text is designed for information systems practitioners, programmers, consultants, developers, information technology managers, specification writers as well as professors and students of graduate-level data mining and machine learning courses.
- Algorithmic methods at the heart of successful data mining—including tried and true techniques as well as leading edge methods
- Performance improvement techniques that work by transforming the input or output
Information systems practitioners, programmers, consultants, developers, information technology managers, specification writers as well as professors and students of graduate-level data mining and machine learning courses
- No. of pages:
- © Morgan Kaufmann 2005
- 8th June 2005
- Morgan Kaufmann
- eBook ISBN:
“This book presents this new discipline in a very accessible form: both as a text to train the next generation of practitioners and researchers, and to inform lifelong learners like myself. Witten and Frank have a passion for simple and elegant solutions. They approach each topic with this mindset, grounding all concepts in concrete examples, and urging the reader to consider the simple techniques first, and then progress to the more sophisticated ones if the simple ones prove inadequate. If you have data that you want to analyze and understand, this book and the associated Weka toolkit are an excellent way to start.” — From the foreword by Jim Gray, Microsoft Research “It covers cutting-edge, data mining technology that forward-looking organizations use to successfully tackle problems that are complex, highly dimensional, chaotic, non-stationary (changing over time), or plagued by. The writing style is well-rounded and engaging without subjectivity, hyperbole, or ambiguity. I consider this book a classic already!” — Dr. Tilmann Bruckhaus, StickyMinds.com
Ian H. Witten is a professor of computer science at the University of Waikato in New Zealand. He directs the New Zealand Digital Library research project. His research interests include information retrieval, machine learning, text compression, and programming by demonstration. He received an MA in Mathematics from Cambridge University, England; an MSc in Computer Science from the University of Calgary, Canada; and a PhD in Electrical Engineering from Essex University, England. He is a fellow of the ACM and of the Royal Society of New Zealand. He has published widely on digital libraries, machine learning, text compression, hypertext, speech synthesis and signal processing, and computer typography. He has written several books, the latest being Managing Gigabytes (1999) and Data Mining (2000), both from Morgan Kaufmann.
Professor, Computer Science Department, University of Waikato, New Zealand.
Eibe Frank lives in New Zealand with his Samoan spouse and two lovely boys, but originally hails from Germany, where he received his first degree in computer science from the University of Karlsruhe. He moved to New Zealand to pursue his Ph.D. in machine learning under the supervision of Ian H. Witten, and joined the Department of Computer Science at the University of Waikato as a lecturer on completion of his studies. He is now an associate professor at the same institution. As an early adopter of the Java programming language, he laid the groundwork for the Weka software described in this book. He has contributed a number of publications on machine learning and data mining to the literature and has refereed for many conferences and journals in these areas.>
Associate Professor, Department of Computer Science, University of Waikato, Hamilton, New Zealand