Data Mining

Data Mining

Practical Machine Learning Tools and Techniques, Second Edition

2nd Edition - June 8, 2005
  • Authors: Ian Witten, Eibe Frank
  • eBook ISBN: 9780080477022

Purchase options

Purchase options
DRM-free (Mobi, PDF, EPub)
Sales tax will be calculated at check-out

Institutional Subscription

Free Global Shipping
No minimum order

Description

Data Mining, Second Edition, describes data mining techniques and shows how they work. The book is a major revision of the first edition that appeared in 1999. While the basic core remains the same, it has been updated to reflect the changes that have taken place over five years, and now has nearly double the references. The highlights of this new edition include thirty new technique sections; an enhanced Weka machine learning workbench, which now features an interactive interface; comprehensive information on neural networks; a new section on Bayesian networks; and much more. This text is designed for information systems practitioners, programmers, consultants, developers, information technology managers, specification writers as well as professors and students of graduate-level data mining and machine learning courses.

Key Features

  • Algorithmic methods at the heart of successful data mining—including tried and true techniques as well as leading edge methods
  • Performance improvement techniques that work by transforming the input or output

Readership

Information systems practitioners, programmers, consultants, developers, information technology managers, specification writers as well as professors and students of graduate-level data mining and machine learning courses

Table of Contents

  • Preface

    1. What’s it all about?
    1.1 Data mining and machine learning
    1.2 Simple examples: the weather problem and others
    1.3 Fielded applications
    1.4 Machine learning and statistics
    1.5 Generalization as search
    1.6 Data mining and ethics
    1.7 Further reading

    2. Input: Concepts, instances, attributes
    2.1 What’s a concept?
    2.2 What’s in an example?
    2.3 What’s in an attribute?
    2.4 Preparing the input
    2.5 Further reading

    3. Output: Knowledge representation
    3.1 Decision tables
    3.2 Decision trees
    3.3 Classification rules
    3.4 Association rules
    3.5 Rules with exceptions
    3.6 Rules involving relations
    3.7 Trees for numeric prediction
    3.8 Instance-based representation
    3.9 Clusters
    3.10 Further reading

    4. Algorithms: The basic methods
    4.1 Inferring rudimentary rules
    4.2 Statistical modeling
    4.3 Divide-and-conquer: constructing decision trees
    4.4 Covering algorithms: constructing rules
    4.5 Mining association rules
    4.6 Linear models
    4.7 Instance-based learning
    4.8 Clustering
    4.9 Further reading

    5. Credibility: Evaluating what’s been learned
    5.1 Training and testing
    5.2 Predicting performance
    5.3 Cross-validation
    5.4 Other estimates
    5.5 Comparing data mining schemes
    5.6 Predicting probabilities
    5.7 Counting the cost
    5.8 Evaluating numeric prediction
    5.9 The minimum description length principle
    5.10 Applying MDL to clustering
    5.11 Further reading

    6. Implementations: Real machine learning schemes
    6.1 Decision trees
    6.2 Classification rules
    6.3 Extending linear models
    6.4 Instance-based learning
    6.5 Numeric prediction
    6.6 Clustering
    6.7 Bayesian networks

    7. Transformations: Engineering the input and output
    7.1 Attribute selection
    7.2 Discretizing numeric attributes
    7.3 Some useful transformations
    7.4 Automatic data cleansing
    7.5 Combining multiple models
    7.6 Using unlabeled data
    7.7 Further reading

    8. Moving on: Extensions and applications
    8.1 Learning from massive datasets
    8.2 Incorporating domain knowledge
    8.3 Text and Web mining
    8.4 Adversarial situations
    8.5 Ubiquitous data mining
    8.6 Further reading

    Part II: The Weka machine learning workbench

    9. Introduction to Weka
    9.1 What’s in Weka?
    9.2 How do you use it?
    9.3 What else can you do?

    10. The Explorer
    10.1 Getting started
    10.2 Exploring the Explorer
    10.3 Filtering algorithms
    10.4 Learning algorithms
    10.5 Meta-learning algorithms
    10.6 Clustering algorithms
    10.7 Association-rule learners
    10.8 Attribute selection

    11. The Knowledge Flow interface
    11.1 Getting started
    11.2 Knowledge Flow components
    11.3 Configuring and connecting the components
    11.4 Incremental learning

    12. The Experimenter
    12.1 Getting started
    12.2 Simple setup
    12.3 Advanced setup
    12.4 The Analyze panel
    12.5 Distributing processing over several machines

    13. The command-line interface
    13.1 Getting started
    13.2 The structure of Weka
    13.3 Command-line options

    14. Embedded machine learning

    15. Writing new learning schemes

    References
    Index

Product details

  • No. of pages: 560
  • Language: English
  • Copyright: © Morgan Kaufmann 2005
  • Published: June 8, 2005
  • Imprint: Morgan Kaufmann
  • eBook ISBN: 9780080477022

About the Authors

Ian Witten

Ian H. Witten is a professor of computer science at the University of Waikato in New Zealand. He directs the New Zealand Digital Library research project. His research interests include information retrieval, machine learning, text compression, and programming by demonstration. He received an MA in Mathematics from Cambridge University, England; an MSc in Computer Science from the University of Calgary, Canada; and a PhD in Electrical Engineering from Essex University, England. He is a fellow of the ACM and of the Royal Society of New Zealand. He has published widely on digital libraries, machine learning, text compression, hypertext, speech synthesis and signal processing, and computer typography. He has written several books, the latest being Managing Gigabytes (1999) and Data Mining (2000), both from Morgan Kaufmann.

Affiliations and Expertise

Professor, Computer Science Department, University of Waikato, New Zealand

Eibe Frank

Eibe Frank lives in New Zealand with his Samoan spouse and two lovely boys, but originally hails from Germany, where he received his first degree in computer science from the University of Karlsruhe. He moved to New Zealand to pursue his Ph.D. in machine learning under the supervision of Ian H. Witten, and joined the Department of Computer Science at the University of Waikato as a lecturer on completion of his studies. He is now an associate professor at the same institution. As an early adopter of the Java programming language, he laid the groundwork for the Weka software described in this book. He has contributed a number of publications on machine learning and data mining to the literature and has refereed for many conferences and journals in these areas.>

Affiliations and Expertise

Associate Professor, Department of Computer Science, University of Waikato, Hamilton, New Zealand