Data Mining, Southeast Asia Edition

2nd Edition - March 1, 2006
Authors: Jiawei Han, Jian Pei, Micheline Kamber
Language: English
Paperback ISBN:
9 7 8 - 0 - 1 2 - 3 7 3 9 0 5 - 6
eBook ISBN:
9 7 8 - 0 - 0 8 - 0 4 7 5 5 8 - 5

Our ability to generate and collect data has been increasing rapidly. Not only are all of our business, scientific, and government transactions now computerized, but the wi… Read more

Purchase options

LIMITED OFFER

Save 50% on book bundles

Immediately download your ebook while waiting for your print delivery. No promo code is needed.

Institutional subscription on ScienceDirect

Request a sales quote

Resources

Textbook support for instructors(opens in new tab/window)

Our ability to generate and collect data has been increasing rapidly. Not only are all of our business, scientific, and government transactions now computerized, but the widespread use of digital cameras, publication tools, and bar codes also generate data. On the collection side, scanned text and image platforms, satellite remote sensing systems, and the World Wide Web have flooded us with a tremendous amount of data. This explosive growth has generated an even more urgent need for new techniques and automated tools that can help us transform this data into useful information and knowledge.

Like the first edition, voted the most popular data mining book by KD Nuggets readers, this book explores concepts and techniques for the discovery of patterns hidden in large data sets, focusing on issues relating to their feasibility, usefulness, effectiveness, and scalability. However, since the publication of the first edition, great progress has been made in the development of new data mining methods, systems, and applications. This new edition substantially enhances the first edition, and new chapters have been added to address recent developments on mining complex types of data— including stream data, sequence data, graph structured data, social network data, and multi-relational data.

Chapter 1: Introduction

1.1 What Motivated Data Mining? Why Is It Important?

1.2 So, What Is Data Mining?

1.3 Data Mining--On What Kind of Data?

1.4 Data Mining Functionalities—What Kinds of Patterns Can Be Mined?

1.5 Are All of the Patterns Interesting?

1.6 Classification of Data Mining Systems

1.7 Data Mining Task Primitives

1.8 Integration of a Data Mining System with a Database or Data Warehouse System

1.9 Major Issues in Data Mining

1.10 Summary

1.11 Exercises

1.12 Bibliographic Notes

Chapter 2: Data Preprocessing

2.1 Why Preprocess the Data?

2.2 Descriptive Data Summarization

2.3 Data Cleaning

2.4 Data Integration and Transformation

2.5 Data Reduction

2.6 Data Discretization and Concept Hierarchy Generation

2.7 Summary

2.8 Exercises

2.9 Bibliographic Notes

Chapter 3: Data Warehouse and OLAP Technology: An Overview

3.1 What Is a Data Warehouse?

3.2 A Multidimensional Data Model

3.3 Data Warehouse Architecture

3.4 Data Warehouse Implementation

3.5 From Data Warehousing to Data Mining

3.6 Summary

3.7 Exercises

3.8 Bibliographic Notes

Chapter 4: Data Cube Computation and Data Generalization

4.1 Efficient Methods for Data Cube Computation

4.2 Further Development of Data Cube and OLAP Technology

4.3 Attribute-Oriented Induction—An Alternative Method for Data Generalization and Concept De-
scription

4.4 Summary

4.5 Exercises

4.6 Bibliographic Notes

Chapter 5: Mining Frequent Patterns, Associations, and Correlations

5.1 Basic Concepts and a Road Map

5.2 Efficient and Scalable Frequent Itemset Mining Methods

5.3 Mining Various Kinds of Association Rules

5.4 From Association Mining to Correlation Analysis

5.5 Constraint-Based Association Mining

5.6 Summary

5.7 Exercises

5.8 Bibliographic Notes

Chapter 6: Classification and Prediction

6.1 What Is Classification? What Is Prediction?

6.2 Issues Regarding Classification and Prediction

6.3 Classification by Decision Tree Induction

6.4 Bayesian Classification

6.5 Rule-Based Classification

6.6 Classification by Backpropagation

6.7 Support Vector Machines

6.8 Associative Classification: Classification by Association Rule Analysis

6.9 Lazy Learners (or Learning from Your Neighbors)

6.10 Other Classification Methods

6.11 Prediction

6.12 Accuracy and Error Measures

6.13 Evaluating the Accuracy of a Classifier or Predictor

6.14 Ensemble Methods—Increasing the Accuracy

6.15 Model Selection

6.16 Summary

6.17 Exercises

6.18 Bibliographic Notes

Chapter 7: Cluster Analysis

7.1 What Is Cluster Analysis?

7.2 Types of Data in Cluster Analysis

7.3 A Categorization of Major Clustering Methods

7.4 Partitioning Methods

7.5 Hierarchical Methods

7.6 Density-Based Methods

7.7 Grid-Based Methods

7.8 Model-Based Clustering Methods

7.9 Clustering High-Dimensional Data

7.10 Constraint-Based Cluster Analysis

7.11 Outlier Analysis

7.12 Summary

7.13 Exercises

7.14 Bibliographic Notes

Chapter 8: Mining Stream, Time-Series, and Sequence Data

8.1 Mining Data Streams

8.2 Mining Time-Series Data

8.3 Mining Sequence Patterns in Transactional Databases

8.4 Mining Sequence Patterns in Biological Data

8.5 Summary

8.6 Exercises

8.7 Bibliographic Notes

Chapter 9: Graph Mining, Social Network Analysis, and Multi-Relational Data Mining

9.1 Graph Mining

9.2 Social Network Analysis

9.3 Multi-Relational Data Mining

9.4 Summary

9.5 Exercises

9.6 Bibliographic Notes

Chapter 10: Mining Object, Spatial, Multimedia, Text, and Web Data

10.1 Multidimensional Analysis and Descriptive Mining of Complex Data Objects

10.2 Spatial Data Mining

10.3 Multimedia Data Mining

10.4 Text Mining

10.5 Mining the World Wide Web

10.6 Summary

10.7 Exercises

10.8 Bibliographic Notes

Chapter 11 Applications and Trends in Data Mining

11.1 Data Mining Applications

11.2 Data Mining System Products and Research Prototypes

11.3 Additional Themes on Data Mining

11.4 Social Impacts of Data Mining

11.5 Trends in Data Mining

11.6 Summary

11.7 Exercises

11.8 Bibliographic Notes

Appendix A: An Introduction to Microsoft's OLE DB for Data Mining

Jiawei Han

Jiawei Han is Professor in the Department of Computer Science at the University of Illinois at Urbana-Champaign. Well known for his research in the areas of data mining and database systems, he has received many awards for his contributions in the field, including the 2004 ACM SIGKDD Innovations Award. He has served as Editor-in-Chief of ACM Transactions on Knowledge Discovery from Data, and on editorial boards of several journals, including IEEE Transactions on Knowledge and Data Engineering and Data Mining and Knowledge Discovery.

Affiliations and expertise

Professor, Department of Computer ScienceUniversity of Illinois, Urbana Champaign, USA

Jian Pei

Jian Pei is currently a Canada Research Chair (Tier 1) in Big Data Science and a Professor in the School of Computing Science at Simon Fraser University. He is also an associate member of the Department of Statistics and Actuarial Science. He is a well-known leading researcher in the general areas of data science, big data, data mining, and database systems. His expertise is on developing effective and efficient data analysis techniques for novel data intensive applications. He is recognized as a Fellow of the Association of Computing Machinery (ACM) for his “contributions to the foundation, methodology and applications of data mining” and as a Fellow of the Institute of Electrical and Electronics Engineers (IEEE) for his “contributions to data mining and knowledge discovery”. He is the editor-in-chief of the IEEE Transactions of Knowledge and Data Engineering (TKDE), a director of the Special Interest Group on Knowledge Discovery in Data (SIGKDD) of the Association for Computing Machinery (ACM), and a general co-chair or program committee co-chair of many premier conferences.

Affiliations and expertise

Simon Fraser University, Burnaby, Canada

Micheline Kamber

Micheline Kamber is a researcher with a passion for writing in easy-to-understand terms. She has a master's degree in computer science (specializing in artificial intelligence) from Concordia University, Canada.

Affiliations and expertise

Simon Fraser University, Burnaby, Canada