Data Mining

Data Mining

Concepts and Techniques

4th Edition - July 2, 2022

Write a review

  • Authors: Jiawei Han, Jian Pei, Hanghang Tong
  • eBook ISBN: 9780128117613
  • Paperback ISBN: 9780128117606

Purchase options

Purchase options
DRM-free (EPub, PDF)
In Stock
Sales tax will be calculated at check-out

Institutional Subscription

Free Global Shipping
No minimum order


Data Mining: Concepts and Techniques, Fourth Edition introduces concepts, principles, and methods for mining patterns, knowledge, and models from various kinds of data for diverse applications. Specifically, it delves into the processes for uncovering patterns and knowledge from massive collections of data, known as knowledge discovery from data, or KDD. It focuses on the feasibility, usefulness, effectiveness, and scalability of data mining techniques for large data sets. After an introduction to the concept of data mining, the authors explain the methods for preprocessing, characterizing, and warehousing data. They then partition the data mining methods into several major tasks, introducing concepts and methods for mining frequent patterns, associations, and correlations for large data sets; data classificcation and model construction; cluster analysis; and outlier detection. Concepts and methods for deep learning are systematically introduced as one chapter. Finally, the book covers the trends, applications, and research frontiers in data mining.

Key Features

  • Presents a comprehensive new chapter on deep learning, including improving training of deep learning models, convolutional neural networks, recurrent neural networks, and graph neural networks
  • Addresses advanced topics in one dedicated chapter: data mining trends and research frontiers, including mining rich data types (text, spatiotemporal data, and graph/networks), data mining applications (such as sentiment analysis, truth discovery, and information propagattion), data mining methodologie and systems, and data mining and society
  • Provides a comprehensive, practical look at the concepts and techniques needed to get the most out of your data


Upper-level undergrads and graduate students studying data mining in computer science programs. Data warehouse engineers, data mining professionals, database researchers, statisticians, data analysts, data modelers, and other data professionals working on data mining at the R&D and implementation levels

Table of Contents

  • Cover image
  • Title page
  • Table of Contents
  • Copyright
  • Dedication
  • Foreword
  • Foreword to second edition
  • Preface
  • Organization of the book
  • To the instructor
  • To the student
  • To the professional
  • Book web site with resources
  • Acknowledgments
  • About the authors
  • Chapter 1: Introduction
  • 1.1. What is data mining?
  • 1.2. Data mining: an essential step in knowledge discovery
  • 1.3. Diversity of data types for data mining
  • 1.4. Mining various kinds of knowledge
  • 1.5. Data mining: confluence of multiple disciplines
  • 1.6. Data mining and applications
  • 1.7. Data mining and society
  • 1.8. Summary
  • 1.9. Exercises
  • 1.10. Bibliographic notes
  • Bibliography
  • Chapter 2: Data, measurements, and data preprocessing
  • 2.1. Data types
  • 2.2. Statistics of data
  • 2.3. Similarity and distance measures
  • 2.4. Data quality, data cleaning, and data integration
  • 2.5. Data transformation
  • 2.6. Dimensionality reduction
  • 2.7. Summary
  • 2.8. Exercises
  • 2.9. Bibliographic notes
  • Bibliography
  • Chapter 3: Data warehousing and online analytical processing
  • 3.1. Data warehouse
  • 3.2. Data warehouse modeling: schema and measures
  • 3.3. OLAP operations
  • 3.4. Data cube computation
  • 3.5. Data cube computation methods
  • 3.6. Summary
  • 3.7. Exercises
  • 3.8. Bibliographic notes
  • Bibliography
  • Chapter 4: Pattern mining: basic concepts and methods
  • 4.1. Basic concepts
  • 4.2. Frequent itemset mining methods
  • 4.3. Which patterns are interesting?—Pattern evaluation methods
  • 4.4. Summary
  • 4.5. Exercises
  • 4.6. Bibliographic notes
  • Bibliography
  • Chapter 5: Pattern mining: advanced methods
  • 5.1. Mining various kinds of patterns
  • 5.2. Mining compressed or approximate patterns
  • 5.3. Constraint-based pattern mining
  • 5.4. Mining sequential patterns
  • 5.5. Mining subgraph patterns
  • 5.6. Pattern mining: application examples
  • 5.7. Summary
  • 5.8. Exercises
  • 5.9. Bibliographic notes
  • Bibliography
  • Chapter 6: Classification: basic concepts and methods
  • 6.1. Basic concepts
  • 6.2. Decision tree induction
  • 6.3. Bayes classification methods
  • 6.4. Lazy learners (or learning from your neighbors)
  • 6.5. Linear classifiers
  • 6.6. Model evaluation and selection
  • 6.7. Techniques to improve classification accuracy
  • 6.8. Summary
  • 6.9. Exercises
  • 6.10. Bibliographic notes
  • Bibliography
  • Chapter 7: Classification: advanced methods
  • 7.1. Feature selection and engineering
  • 7.2. Bayesian belief networks
  • 7.3. Support vector machines
  • 7.4. Rule-based and pattern-based classification
  • 7.5. Classification with weak supervision
  • 7.6. Classification with rich data type
  • 7.7. Potpourri: other related techniques
  • 7.8. Summary
  • 7.9. Exercises
  • 7.10. Bibliographic notes
  • Bibliography
  • Chapter 8: Cluster analysis: basic concepts and methods
  • 8.1. Cluster analysis
  • 8.2. Partitioning methods
  • 8.3. Hierarchical methods
  • 8.4. Density-based and grid-based methods
  • 8.5. Evaluation of clustering
  • 8.6. Summary
  • 8.7. Exercises
  • 8.8. Bibliographic notes
  • Bibliography
  • Chapter 9: Cluster analysis: advanced methods
  • 9.1. Probabilistic model-based clustering
  • 9.2. Clustering high-dimensional data
  • 9.3. Biclustering
  • 9.4. Dimensionality reduction for clustering
  • 9.5. Clustering graph and network data
  • 9.6. Semisupervised clustering
  • 9.7. Summary
  • 9.8. Exercises
  • 9.9. Bibliographic notes
  • Bibliography
  • Chapter 10: Deep learning
  • 10.1. Basic concepts
  • 10.2. Improve training of deep learning models
  • 10.3. Convolutional neural networks
  • 10.4. Recurrent neural networks
  • 10.5. Graph neural networks
  • 10.6. Summary
  • 10.7. Exercises
  • 10.8. Bibliographic notes
  • Bibliography
  • Chapter 11: Outlier detection
  • 11.1. Basic concepts
  • 11.2. Statistical approaches
  • 11.3. Proximity-based approaches
  • 11.4. Reconstruction-based approaches
  • 11.5. Clustering- vs. classification-based approaches
  • 11.6. Mining contextual and collective outliers
  • 11.7. Outlier detection in high-dimensional data
  • 11.8. Summary
  • 11.9. Exercises
  • 11.10. Bibliographic notes
  • Bibliography
  • Chapter 12: Data mining trends and research frontiers
  • 12.1. Mining rich data types
  • 12.2. Data mining applications
  • 12.3. Data mining methodologies and systems
  • 12.4. Data mining, people, and society
  • Bibliography
  • Appendix A: Mathematical background
  • 1.1. Probability and statistics
  • 1.2. Numerical optimization
  • 1.3. Matrix and linear algebra
  • 1.4. Concepts and tools from signal processing
  • 1.5. Bibliographic notes
  • Bibliography
  • Bibliography
  • Bibliography
  • Index

Product details

  • No. of pages: 752
  • Language: English
  • Copyright: © Morgan Kaufmann 2022
  • Published: July 2, 2022
  • Imprint: Morgan Kaufmann
  • eBook ISBN: 9780128117613
  • Paperback ISBN: 9780128117606

About the Authors

Jiawei Han

Jiawei Han is Professor in the Department of Computer Science at the University of Illinois at Urbana-Champaign. Well known for his research in the areas of data mining and database systems, he has received many awards for his contributions in the field, including the 2004 ACM SIGKDD Innovations Award. He has served as Editor-in-Chief of ACM Transactions on Knowledge Discovery from Data, and on editorial boards of several journals, including IEEE Transactions on Knowledge and Data Engineering and Data Mining and Knowledge Discovery.

Affiliations and Expertise

Professor, Department of Computer ScienceUniversity of Illinois, Urbana Champaign, USA

Jian Pei

Jian Pei is currently a Canada Research Chair (Tier 1) in Big Data Science and a Professor in the School of Computing Science at Simon Fraser University. He is also an associate member of the Department of Statistics and Actuarial Science. He is a well-known leading researcher in the general areas of data science, big data, data mining, and database systems. His expertise is on developing effective and efficient data analysis techniques for novel data intensive applications. He is recognized as a Fellow of the Association of Computing Machinery (ACM) for his “contributions to the foundation, methodology and applications of data mining” and as a Fellow of the Institute of Electrical and Electronics Engineers (IEEE) for his “contributions to data mining and knowledge discovery”. He is the editor-in-chief of the IEEE Transactions of Knowledge and Data Engineering (TKDE), a director of the Special Interest Group on Knowledge Discovery in Data (SIGKDD) of the Association for Computing Machinery (ACM), and a general co-chair or program committee co-chair of many premier conferences.

Affiliations and Expertise

Simon Fraser University, Burnaby, Canada

Hanghang Tong

Hanghang Tong Ph.D. is currently an associate professor in the Department of Computer Science at the University of Illinois at Urbana-Champaign. Before that he was an associate professor at the School of Computing, Informatics, and Decision Systems Engineering (CIDSE), Arizona State University. He received his M.Sc. and Ph.D. degrees from Carnegie Mellon University in 2008 and 2009, both in Machine Learning. His research interest is in large scale data mining for graphs and multimedia. He has received several awards, including SDM/IBM Early Career Data Mining Research award (2018), NSF CAREER award (2017), ICDM 10-Year Highest Impact Paper award (2015), four best paper awards (TUP'14, CIKM'12, SDM'08, ICDM'06), seven 'bests of conference', 1 best demo, honorable mention (SIGMOD'17), and 1 best demo candidate, second place (CIKM'17). He has published over 100 refereed articles. He is the Editor-in-Chief of SIGKDD Explorations (ACM), an action editor of Data Mining and Knowledge Discovery (Springer), and an associate editor of Knowledge and Information Systems (Springer) and Neurocomputing Journal (Elsevier); and has served as a program committee member in multiple data mining, database and artificial intelligence venues (e.g., SIGKDD, SIGMOD, AAAI, WWW, CIKM, etc.).

Affiliations and Expertise

Associate Professor, Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA

Ratings and Reviews

Write a review

Latest reviews

(Total rating for all reviews)

  • Ghaidaa B. Wed Nov 09 2022