Data Mining

Practical Machine Learning Tools and Techniques, Second Edition

2nd Edition - June 8, 2005
Authors: Ian H. Witten, Eibe Frank
Language: English
eBook ISBN:
9 7 8 - 0 - 0 8 - 0 4 7 7 0 2 - 2

Data Mining, Second Edition, describes data mining techniques and shows how they work. The book is a major revision of the first edition that appeared in 1999. While the basic cor… Read more

Purchase options

LIMITED OFFER

Save 50% on book bundles

Immediately download your ebook while waiting for your print delivery. No promo code is needed.

Institutional subscription on ScienceDirect

Request a sales quote

Resources

Textbook support for instructors(opens in new tab/window)

Data Mining, Second Edition, describes data mining techniques and shows how they work. The book is a major revision of the first edition that appeared in 1999. While the basic core remains the same, it has been updated to reflect the changes that have taken place over five years, and now has nearly double the references.

The highlights of this new edition include thirty new technique sections; an enhanced Weka machine learning workbench, which now features an interactive interface; comprehensive information on neural networks; a new section on Bayesian networks; and much more.

This text is designed for information systems practitioners, programmers, consultants, developers, information technology managers, specification writers as well as professors and students of graduate-level data mining and machine learning courses.

Preface

1. What’s it all about?

1.1 Data mining and machine learning

1.2 Simple examples: the weather problem and others

1.3 Fielded applications

1.4 Machine learning and statistics

1.5 Generalization as search

1.6 Data mining and ethics

1.7 Further reading

2. Input: Concepts, instances, attributes

2.1 What’s a concept?

2.2 What’s in an example?

2.3 What’s in an attribute?

2.4 Preparing the input

2.5 Further reading

3. Output: Knowledge representation

3.1 Decision tables

3.2 Decision trees

3.3 Classification rules

3.4 Association rules

3.5 Rules with exceptions

3.6 Rules involving relations

3.7 Trees for numeric prediction

3.8 Instance-based representation

3.9 Clusters

3.10 Further reading

4. Algorithms: The basic methods

4.1 Inferring rudimentary rules

4.2 Statistical modeling

4.3 Divide-and-conquer: constructing decision trees

4.4 Covering algorithms: constructing rules

4.5 Mining association rules

4.6 Linear models

4.7 Instance-based learning

4.8 Clustering

4.9 Further reading

5. Credibility: Evaluating what’s been learned

5.1 Training and testing

5.2 Predicting performance

5.3 Cross-validation

5.4 Other estimates

5.5 Comparing data mining schemes

5.6 Predicting probabilities

5.7 Counting the cost

5.8 Evaluating numeric prediction

5.9 The minimum description length principle

5.10 Applying MDL to clustering

5.11 Further reading

6. Implementations: Real machine learning schemes

6.1 Decision trees

6.2 Classification rules

6.3 Extending linear models

6.4 Instance-based learning

6.5 Numeric prediction

6.6 Clustering

6.7 Bayesian networks

7. Transformations: Engineering the input and output

7.1 Attribute selection

7.2 Discretizing numeric attributes

7.3 Some useful transformations

7.4 Automatic data cleansing

7.5 Combining multiple models

7.6 Using unlabeled data

7.7 Further reading

8. Moving on: Extensions and applications

8.1 Learning from massive datasets

8.2 Incorporating domain knowledge

8.3 Text and Web mining

8.4 Adversarial situations

8.5 Ubiquitous data mining

8.6 Further reading

Part II: The Weka machine learning workbench

9. Introduction to Weka

9.1 What’s in Weka?

9.2 How do you use it?

9.3 What else can you do?

10. The Explorer

10.1 Getting started

10.2 Exploring the Explorer

10.3 Filtering algorithms

10.4 Learning algorithms

10.5 Meta-learning algorithms

10.6 Clustering algorithms

10.7 Association-rule learners

10.8 Attribute selection

11. The Knowledge Flow interface

11.1 Getting started

11.2 Knowledge Flow components

11.3 Configuring and connecting the components

11.4 Incremental learning

12. The Experimenter

12.1 Getting started

12.2 Simple setup

12.3 Advanced setup

12.4 The Analyze panel

12.5 Distributing processing over several machines

13. The command-line interface

13.1 Getting started

13.2 The structure of Weka

13.3 Command-line options

14. Embedded machine learning

15. Writing new learning schemes

References
Index

Purchase options

Save 50% on book bundles

Institutional subscription on ScienceDirect

Resources

Ian H. Witten

Eibe Frank