Data Mining

Data Mining

Concepts and Techniques

4th Edition - July 2, 2022

Write a review

  • Authors: Jiawei Han, Jian Pei, Hanghang Tong
  • eBook ISBN: 9780128117613
  • Paperback ISBN: 9780128117606

Purchase options

Purchase options
DRM-free (EPub, PDF)
Available for Pre-Order
Sales tax will be calculated at check-out

Institutional Subscription

Free Global Shipping
No minimum order

Description

Data Mining: Concepts and Techniques, Fourth Edition provides the theories and methods for processing data or information used in various applications. Specifically, it explains data mining and the tools used in discovering knowledge from collected data, known as KDD. The book focuses on the feasibility, usefulness, effectiveness and scalability of techniques of large datasets. After describing data mining, the authors explain the methods of knowing, preprocessing, processing and warehousing data. They then present information about data warehouses, online analytical processing (OLAP), and data cube technology. Then, the methods involved in mining frequent patterns, associations, and correlations for large data sets are described. The book details the methods for data classification and introduces the concepts and methods for data clustering. The remaining chapters discuss the outlier detection and the trends, applications, and research frontiers in data mining. Users from computer science students, application developers, business professionals, and researchers who seek information on data mining will find this resource very helpful.

Key Features

  • Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects
  • Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields
  • Provides a comprehensive, practical look at the concepts and techniques needed to get the most out of your data

Readership

Upper-level undergrads and graduate students studying data mining in computer science programs. Data warehouse engineers, data mining professionals, database researchers, statisticians, data analysts, data modelers, and other data professionals working on data mining at the R&D and implementation levels

Table of Contents

  • Cover image
  • Title page
  • Table of Contents
  • Copyright
  • Dedication
  • Foreword
  • Foreword to second edition
  • Preface
  • Organization of the book
  • To the instructor
  • To the student
  • To the professional
  • Book web site with resources
  • Acknowledgments
  • About the authors
  • Chapter 1: Introduction
  • 1.1. What is data mining?
  • 1.2. Data mining: an essential step in knowledge discovery
  • 1.3. Diversity of data types for data mining
  • 1.4. Mining various kinds of knowledge
  • 1.5. Data mining: confluence of multiple disciplines
  • 1.6. Data mining and applications
  • 1.7. Data mining and society
  • 1.8. Summary
  • 1.9. Exercises
  • 1.10. Bibliographic notes
  • Bibliography
  • Chapter 2: Data, measurements, and data preprocessing
  • 2.1. Data types
  • 2.2. Statistics of data
  • 2.3. Similarity and distance measures
  • 2.4. Data quality, data cleaning, and data integration
  • 2.5. Data transformation
  • 2.6. Dimensionality reduction
  • 2.7. Summary
  • 2.8. Exercises
  • 2.9. Bibliographic notes
  • Bibliography
  • Chapter 3: Data warehousing and online analytical processing
  • 3.1. Data warehouse
  • 3.2. Data warehouse modeling: schema and measures
  • 3.3. OLAP operations
  • 3.4. Data cube computation
  • 3.5. Data cube computation methods
  • 3.6. Summary
  • 3.7. Exercises
  • 3.8. Bibliographic notes
  • Bibliography
  • Chapter 4: Pattern mining: basic concepts and methods
  • 4.1. Basic concepts
  • 4.2. Frequent itemset mining methods
  • 4.3. Which patterns are interesting?—Pattern evaluation methods
  • 4.4. Summary
  • 4.5. Exercises
  • 4.6. Bibliographic notes
  • Bibliography
  • Chapter 5: Pattern mining: advanced methods
  • 5.1. Mining various kinds of patterns
  • 5.2. Mining compressed or approximate patterns
  • 5.3. Constraint-based pattern mining
  • 5.4. Mining sequential patterns
  • 5.5. Mining subgraph patterns
  • 5.6. Pattern mining: application examples
  • 5.7. Summary
  • 5.8. Exercises
  • 5.9. Bibliographic notes
  • Bibliography
  • Chapter 6: Classification: basic concepts and methods
  • 6.1. Basic concepts
  • 6.2. Decision tree induction
  • 6.3. Bayes classification methods
  • 6.4. Lazy learners (or learning from your neighbors)
  • 6.5. Linear classifiers
  • 6.6. Model evaluation and selection
  • 6.7. Techniques to improve classification accuracy
  • 6.8. Summary
  • 6.9. Exercises
  • 6.10. Bibliographic notes
  • Bibliography
  • Chapter 7: Classification: advanced methods
  • 7.1. Feature selection and engineering
  • 7.2. Bayesian belief networks
  • 7.3. Support vector machines
  • 7.4. Rule-based and pattern-based classification
  • 7.5. Classification with weak supervision
  • 7.6. Classification with rich data type
  • 7.7. Potpourri: other related techniques
  • 7.8. Summary
  • 7.9. Exercises
  • 7.10. Bibliographic notes
  • Bibliography
  • Chapter 8: Cluster analysis: basic concepts and methods
  • 8.1. Cluster analysis
  • 8.2. Partitioning methods
  • 8.3. Hierarchical methods
  • 8.4. Density-based and grid-based methods
  • 8.5. Evaluation of clustering
  • 8.6. Summary
  • 8.7. Exercises
  • 8.8. Bibliographic notes
  • Bibliography
  • Chapter 9: Cluster analysis: advanced methods
  • 9.1. Probabilistic model-based clustering
  • 9.2. Clustering high-dimensional data
  • 9.3. Biclustering
  • 9.4. Dimensionality reduction for clustering
  • 9.5. Clustering graph and network data
  • 9.6. Semisupervised clustering
  • 9.7. Summary
  • 9.8. Exercises
  • 9.9. Bibliographic notes
  • Bibliography
  • Chapter 10: Deep learning
  • 10.1. Basic concepts
  • 10.2. Improve training of deep learning models
  • 10.3. Convolutional neural networks
  • 10.4. Recurrent neural networks
  • 10.5. Graph neural networks
  • 10.6. Summary
  • 10.7. Exercises
  • 10.8. Bibliographic notes
  • Bibliography
  • Chapter 11: Outlier detection
  • 11.1. Basic concepts
  • 11.2. Statistical approaches
  • 11.3. Proximity-based approaches
  • 11.4. Reconstruction-based approaches
  • 11.5. Clustering- vs. classification-based approaches
  • 11.6. Mining contextual and collective outliers
  • 11.7. Outlier detection in high-dimensional data
  • 11.8. Summary
  • 11.9. Exercises
  • 11.10. Bibliographic notes
  • Bibliography
  • Chapter 12: Data mining trends and research frontiers
  • 12.1. Mining rich data types
  • 12.2. Data mining applications
  • 12.3. Data mining methodologies and systems
  • 12.4. Data mining, people, and society
  • Bibliography
  • Appendix A: Mathematical background
  • 1.1. Probability and statistics
  • 1.2. Numerical optimization
  • 1.3. Matrix and linear algebra
  • 1.4. Concepts and tools from signal processing
  • 1.5. Bibliographic notes
  • Bibliography
  • Bibliography
  • Bibliography
  • Index

Product details

  • No. of pages: 752
  • Language: English
  • Copyright: © Morgan Kaufmann 2022
  • Published: July 2, 2022
  • Imprint: Morgan Kaufmann
  • eBook ISBN: 9780128117613
  • Paperback ISBN: 9780128117606

About the Authors

Jiawei Han

Jiawei Han is Professor in the Department of Computer Science at the University of Illinois at Urbana-Champaign. Well known for his research in the areas of data mining and database systems, he has received many awards for his contributions in the field, including the 2004 ACM SIGKDD Innovations Award. He has served as Editor-in-Chief of ACM Transactions on Knowledge Discovery from Data, and on editorial boards of several journals, including IEEE Transactions on Knowledge and Data Engineering and Data Mining and Knowledge Discovery.

Affiliations and Expertise

Professor, Department of Computer ScienceUniversity of Illinois, Urbana Champaign, USA

Jian Pei

Jian Pei is currently a Canada Research Chair (Tier 1) in Big Data Science and a Professor in the School of Computing Science at Simon Fraser University. He is also an associate member of the Department of Statistics and Actuarial Science. He is a well-known leading researcher in the general areas of data science, big data, data mining, and database systems. His expertise is on developing effective and efficient data analysis techniques for novel data intensive applications. He is recognized as a Fellow of the Association of Computing Machinery (ACM) for his “contributions to the foundation, methodology and applications of data mining” and as a Fellow of the Institute of Electrical and Electronics Engineers (IEEE) for his “contributions to data mining and knowledge discovery”. He is the editor-in-chief of the IEEE Transactions of Knowledge and Data Engineering (TKDE), a director of the Special Interest Group on Knowledge Discovery in Data (SIGKDD) of the Association for Computing Machinery (ACM), and a general co-chair or program committee co-chair of many premier conferences.

Affiliations and Expertise

Simon Fraser University, Burnaby, Canada

Hanghang Tong

Hanghang Tong Ph.D. is currently an associate professor in the Department of Computer Science at the University of Illinois at Urbana-Champaign. Before that he was an associate professor at the School of Computing, Informatics, and Decision Systems Engineering (CIDSE), Arizona State University. He received his M.Sc. and Ph.D. degrees from Carnegie Mellon University in 2008 and 2009, both in Machine Learning. His research interest is in large scale data mining for graphs and multimedia. He has received several awards, including SDM/IBM Early Career Data Mining Research award (2018), NSF CAREER award (2017), ICDM 10-Year Highest Impact Paper award (2015), four best paper awards (TUP'14, CIKM'12, SDM'08, ICDM'06), seven 'bests of conference', 1 best demo, honorable mention (SIGMOD'17), and 1 best demo candidate, second place (CIKM'17). He has published over 100 refereed articles. He is the Editor-in-Chief of SIGKDD Explorations (ACM), an action editor of Data Mining and Knowledge Discovery (Springer), and an associate editor of Knowledge and Information Systems (Springer) and Neurocomputing Journal (Elsevier); and has served as a program committee member in multiple data mining, database and artificial intelligence venues (e.g., SIGKDD, SIGMOD, AAAI, WWW, CIKM, etc.).

Affiliations and Expertise

Associate Professor, Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA

Ratings and Reviews

Write a review

There are currently no reviews for "Data Mining"