Data Mining

Concepts and Techniques

4th Edition - July 2, 2022
Authors: Jiawei Han, Jian Pei, Hanghang Tong
Language: English
Paperback ISBN:
9 7 8 - 0 - 1 2 - 8 1 1 7 6 0 - 6
eBook ISBN:
9 7 8 - 0 - 1 2 - 8 1 1 7 6 1 - 3

Data Mining: Concepts and Techniques, Fourth Edition introduces concepts, principles, and methods for mining patterns, knowledge, and models from various kinds of data for diverse… Read more

Purchase options

LIMITED OFFER

Save 50% on book bundles

Immediately download your ebook while waiting for your print delivery. No promo code is needed.

Institutional subscription on ScienceDirect

Request a sales quote

Resources

Textbook support for instructors(opens in new tab/window)

Data Mining: Concepts and Techniques, Fourth Edition introduces concepts, principles, and methods for mining patterns, knowledge, and models from various kinds of data for diverse applications. Specifically, it delves into the processes for uncovering patterns and knowledge from massive collections of data, known as knowledge discovery from data, or KDD. It focuses on the feasibility, usefulness, effectiveness, and scalability of data mining techniques for large data sets.

After an introduction to the concept of data mining, the authors explain the methods for preprocessing, characterizing, and warehousing data. They then partition the data mining methods into several major tasks, introducing concepts and methods for mining frequent patterns, associations, and correlations for large data sets; data classificcation and model construction; cluster analysis; and outlier detection. Concepts and methods for deep learning are systematically introduced as one chapter. Finally, the book covers the trends, applications, and research frontiers in data mining.

Cover image
Title page
Table of Contents
Copyright
Dedication
Foreword
Foreword to second edition
Preface
Organization of the book
To the instructor
To the student
To the professional
Book web site with resources
Acknowledgments
About the authors
Chapter 1: Introduction
1.1. What is data mining?
1.2. Data mining: an essential step in knowledge discovery
1.3. Diversity of data types for data mining
1.4. Mining various kinds of knowledge
1.5. Data mining: confluence of multiple disciplines
1.6. Data mining and applications
1.7. Data mining and society
1.8. Summary
1.9. Exercises
1.10. Bibliographic notes
Bibliography
Chapter 2: Data, measurements, and data preprocessing
2.1. Data types
2.2. Statistics of data
2.3. Similarity and distance measures
2.4. Data quality, data cleaning, and data integration
2.5. Data transformation
2.6. Dimensionality reduction
2.7. Summary
2.8. Exercises
2.9. Bibliographic notes
Bibliography
Chapter 3: Data warehousing and online analytical processing
3.1. Data warehouse
3.2. Data warehouse modeling: schema and measures
3.3. OLAP operations
3.4. Data cube computation
3.5. Data cube computation methods
3.6. Summary
3.7. Exercises
3.8. Bibliographic notes
Bibliography
Chapter 4: Pattern mining: basic concepts and methods
4.1. Basic concepts
4.2. Frequent itemset mining methods
4.3. Which patterns are interesting?—Pattern evaluation methods
4.4. Summary
4.5. Exercises
4.6. Bibliographic notes
Bibliography
Chapter 5: Pattern mining: advanced methods
5.1. Mining various kinds of patterns
5.2. Mining compressed or approximate patterns
5.3. Constraint-based pattern mining
5.4. Mining sequential patterns
5.5. Mining subgraph patterns
5.6. Pattern mining: application examples
5.7. Summary
5.8. Exercises
5.9. Bibliographic notes
Bibliography
Chapter 6: Classification: basic concepts and methods
6.1. Basic concepts
6.2. Decision tree induction
6.3. Bayes classification methods
6.4. Lazy learners (or learning from your neighbors)
6.5. Linear classifiers
6.6. Model evaluation and selection
6.7. Techniques to improve classification accuracy
6.8. Summary
6.9. Exercises
6.10. Bibliographic notes
Bibliography
Chapter 7: Classification: advanced methods
7.1. Feature selection and engineering
7.2. Bayesian belief networks
7.3. Support vector machines
7.4. Rule-based and pattern-based classification
7.5. Classification with weak supervision
7.6. Classification with rich data type
7.7. Potpourri: other related techniques
7.8. Summary
7.9. Exercises
7.10. Bibliographic notes
Bibliography
Chapter 8: Cluster analysis: basic concepts and methods
8.1. Cluster analysis
8.2. Partitioning methods
8.3. Hierarchical methods
8.4. Density-based and grid-based methods
8.5. Evaluation of clustering
8.6. Summary
8.7. Exercises
8.8. Bibliographic notes
Bibliography
Chapter 9: Cluster analysis: advanced methods
9.1. Probabilistic model-based clustering
9.2. Clustering high-dimensional data
9.3. Biclustering
9.4. Dimensionality reduction for clustering
9.5. Clustering graph and network data
9.6. Semisupervised clustering
9.7. Summary
9.8. Exercises
9.9. Bibliographic notes
Bibliography
Chapter 10: Deep learning
10.1. Basic concepts
10.2. Improve training of deep learning models
10.3. Convolutional neural networks
10.4. Recurrent neural networks
10.5. Graph neural networks
10.6. Summary
10.7. Exercises
10.8. Bibliographic notes
Bibliography
Chapter 11: Outlier detection
11.1. Basic concepts
11.2. Statistical approaches
11.3. Proximity-based approaches
11.4. Reconstruction-based approaches
11.5. Clustering- vs. classification-based approaches
11.6. Mining contextual and collective outliers
11.7. Outlier detection in high-dimensional data
11.8. Summary
11.9. Exercises
11.10. Bibliographic notes
Bibliography
Chapter 12: Data mining trends and research frontiers
12.1. Mining rich data types
12.2. Data mining applications
12.3. Data mining methodologies and systems
12.4. Data mining, people, and society
Bibliography
Appendix A: Mathematical background
1.1. Probability and statistics
1.2. Numerical optimization
1.3. Matrix and linear algebra
1.4. Concepts and tools from signal processing
1.5. Bibliographic notes
Bibliography
Bibliography
Bibliography
Index

Jiawei Han

Jiawei Han is Professor in the Department of Computer Science at the University of Illinois at Urbana-Champaign. Well known for his research in the areas of data mining and database systems, he has received many awards for his contributions in the field, including the 2004 ACM SIGKDD Innovations Award. He has served as Editor-in-Chief of ACM Transactions on Knowledge Discovery from Data, and on editorial boards of several journals, including IEEE Transactions on Knowledge and Data Engineering and Data Mining and Knowledge Discovery.

Affiliations and expertise

Professor, Department of Computer ScienceUniversity of Illinois, Urbana Champaign, USA

Jian Pei

Jian Pei is currently a Canada Research Chair (Tier 1) in Big Data Science and a Professor in the School of Computing Science at Simon Fraser University. He is also an associate member of the Department of Statistics and Actuarial Science. He is a well-known leading researcher in the general areas of data science, big data, data mining, and database systems. His expertise is on developing effective and efficient data analysis techniques for novel data intensive applications. He is recognized as a Fellow of the Association of Computing Machinery (ACM) for his “contributions to the foundation, methodology and applications of data mining” and as a Fellow of the Institute of Electrical and Electronics Engineers (IEEE) for his “contributions to data mining and knowledge discovery”. He is the editor-in-chief of the IEEE Transactions of Knowledge and Data Engineering (TKDE), a director of the Special Interest Group on Knowledge Discovery in Data (SIGKDD) of the Association for Computing Machinery (ACM), and a general co-chair or program committee co-chair of many premier conferences.

Affiliations and expertise

Simon Fraser University, Burnaby, Canada

Hanghang Tong

Hanghang Tong Ph.D. is currently an associate professor in the Department of Computer Science at the University of Illinois at Urbana-Champaign. Before that he was an associate professor at the School of Computing, Informatics, and Decision Systems Engineering (CIDSE), Arizona State University. He received his M.Sc. and Ph.D. degrees from Carnegie Mellon University in 2008 and 2009, both in Machine Learning. His research interest is in large scale data mining for graphs and multimedia. He has received several awards, including SDM/IBM Early Career Data Mining Research award (2018), NSF CAREER award (2017), ICDM 10-Year Highest Impact Paper award (2015), four best paper awards (TUP'14, CIKM'12, SDM'08, ICDM'06), seven 'bests of conference', 1 best demo, honorable mention (SIGMOD'17), and 1 best demo candidate, second place (CIKM'17). He has published over 100 refereed articles. He is the Editor-in-Chief of SIGKDD Explorations (ACM), an action editor of Data Mining and Knowledge Discovery (Springer), and an associate editor of Knowledge and Information Systems (Springer) and Neurocomputing Journal (Elsevier); and has served as a program committee member in multiple data mining, database and artificial intelligence venues (e.g., SIGKDD, SIGMOD, AAAI, WWW, CIKM, etc.).

Affiliations and expertise

Associate Professor, Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA