COVID-19 Update: We are currently shipping orders daily. However, due to transit disruptions in some geographies, deliveries may be delayed. To provide all customers with timely access to content, we are offering 50% off Science and Technology Print & eBook bundle options. Terms & conditions.
Predictive Data Mining - 1st Edition - ISBN: 9781558604032, 9780080514659

Predictive Data Mining

1st Edition

A Practical Guide

Authors: Sholom Weiss Nitin Indurkhya
Paperback ISBN: 9781558604032
eBook ISBN: 9780080514659
Imprint: Morgan Kaufmann
Published Date: 8th December 1997
Page Count: 228
Sales tax will be calculated at check-out Price includes VAT/GST

Institutional Subscription

Secure Checkout

Personal information is secured with SSL technology.

Free Shipping

Free global shipping
No minimum order.

Table of Contents

Preface 1 What is Data Mining? 1.1 Big Data 1.1.1 The Data Warehouse 1.1.2 Timelines 1.2 Types of Data-Mining Problems 1.3 The Pedigree of Data Mining 1.3.1 Databases 1.3.2 Statistics 1.3.3 Machine Learning 1.4 Is Big Better? 1.4.1 Strong Statistical Evaluation 1.4.2 More Intensive Search 1.4.3 More Controlled Experiments 1.4.4 Is Big Necessary 1.5 The Tasks of Predictive Data Mining 1.5.1 Data Preparation 1.5.2 Data Reduction 1.5.3 Data Modeling and Prediction 1.5.4 Case and Solution Analyses 1.6 Data Mining: Art or Science 1.7 An Overview of the Book 1.8 Bibliographic and Historical Remarks

2 Statistical Evaluation for Big Data 2.1 The Idealized Model 2.1.1 Classical Statistical Comparison and Evaluation 2.2 It's Big but Is It Biased 2.2.1 Objective Versus Survey Data 2.2.2 Significance and Predictive Value Too Many Comparisons? 2.3 Classical Types of Statistical Prediction 2.3.1 Predicting True-or-False: Classification Error Rates 2.3.2 Forecasting Numbers: Regression Distance Measures 2.4 Measuring Predictive Performance 2.4.1 Independent Testing Random Training and Testing How Accurate Is the Error Estimate? Comparing Results for Error Measures Ideal or Real-World Sampling? Training and Testing from Different Time Periods 2.5 Too Much Searching and Testing? 2.6 Why Are Errors Made? 2.7 Bibliographic and Historical Remarks

3 Preparing the Data 3.1 A Standard Form 3.1.1 Standard Measurements 3.1.2 Goals 3.2 Data Transformations 3.2.1 Normalizations 3.2.2 Data Smoothing 3.2.3 Differences and Ratios 3.3 Missing Data 3.4 Time-Dependent Data 3.4.1 Time Series 3.4.2 Composing Features from Time Series Current Values Moving Averages Trends Seasonal Adjustments 3.5 Hybrid Time-Dependent Applications 3.5.1 Multivariate Time Series 3.5.2 Classification and Time Series 3.5.3 Standard Cases and Time-Series Attributes 3.6 Text Mining 3.7 Bibliographic and Historical Remarks

4 Data Reduction 4.1 Selecting the Best Features 4.2 Feature Selection from Means and Variances 4.2.1 Independent Features 4.2.2 Distance-Based Optimal Feature Selection 4.2.3 Heuristic Feature Selection 4.3 Principal Constraints 4.4 Feature Selection by Decision Trees 4.5 How Many Measured Values 4.5.1 Reducing and Smoothing Values Rounding K-Means Clustering Class Entropy 4.6 How Many Cases? 4.6.1 A Single Sample 4.6.2 Incremental Samples 4.6.3 Average Samples 4.6.4 Specialized Case-Reduction Techniques Sequential Sampling over Time Strategic Sampling of Key Events Adjusting Prevalence 4.7 Bibliographic and Historical Remarks

5 Looking for Solutions 5.1 Overview 5.2 Math Solutions 5.2.1 Linear Scoring 5.2.2 Nonlinear Scoring: Neural Nets 5.2.3 Advanced Statistical Methods 5.3 Distance Solutions 5.4 Logic Solutions 5.4.1 Decision Trees 5.4.2 Decision Rules 5.5 What Do the Answers Mean? 5.5.1 Is It Safe to Edit Solutions? 5.6 Which Solution is Preferable? 5.7 Combining Different Answers 5.7.1 Multiple Prediction Methods 5.7.2 Multiple Samples 5.8 Bibliographic and Historical Remarks

6 What's Best for Data Reduction and Mining? 6.1 Let's Analyze Some Real Data 6.2 The Experimental Methods 6.3 The Empirical Results 6.3.1 Significance Testing 6.4 So What Did We Learn? 6.4.1 Feature Selection 6.4.2 Value Reduction 6.4.3 Subsampling or All Cases 6.5 Graphical Trend Analysis 6.5.1 Incremental Case Analysis 6.5.2 Incremental Complexity Analysis 6.6 Maximum Data Reduction 6.7 Are There Winners and Losers in Performance? 6.8 Getting the Best Results 6.9 Bibliogaphic and Historical Remarks

7 Art or Science? Case Studies in Data Mining 7.1 Why These Case Studies? 7.2 A Summary of Tasks for Predictive Data Mining 7.2.1 A Checklist for Data Preparation 7.2.2 A Checklist for Data Reduction 7.2.3 A Checklist for Data Modeling and Prediction 7.2.4 A Checklist for Case and Solution Analyses 7.3 The Case Studies 7.3.1 Transaction Processing 7.3.2 Text Mining 7.3.3 Outcomes Analysis 7.3.4 Process Control 7.3.5 Marketing and User Profiling 7.3.6 Exploratory Analysis 7.4 Looking Ahead 7.5 Bibliographic and Historical Remarks

Appendix: Data-Miner Software Kit


The potential business advantages of data mining are well documented in publications for executives and managers. However, developers implementing major data-mining systems need concrete information about the underlying technical principles—and their practical manifestations—in order to either integrate commercially available tools or write data-mining programs from scratch. This book is the first technical guide to provide a complete, generalized roadmap for developing data-mining applications, together with advice on performing these large-scale, open-ended analyses for real-world data warehouses.

Note: If you already own Predictive Data Mining: A Practical Guide, please see ISBN 1-55860-477-4 to order the accompanying software. To order the book/software package, please see ISBN 1-55860-478-2.

Key Features

  • Focuses on the preparation and organization of data and the development of an overall strategy for data mining.
  • Reviews sophisticated prediction methods that search for patterns in big data.
  • Describes how to accurately estimate future performance of proposed solutions.
  • Illustrates the data-mining process and its potential pitfalls through real-life case studies.


No. of pages:
© Morgan Kaufmann 1997
8th December 1997
Morgan Kaufmann
Paperback ISBN:
eBook ISBN:


"I enjoy reading PREDICTIVE DATA MINING. It presents an excellent perspective on the theory and practice of data mining. It can help educate statisticians to build alliances between statisticians and data miners." --Emanuel Parzen, Distinguished Professor of Statistics, Texas A&M University

Ratings and Reviews

About the Authors

Sholom Weiss

Sholom M. Weiss is a professor of computer science at Rutgers University and the author of dozens of research papers on data mining and knowledge-based systems. He is a fellow of the American Association for Artificial Intelligence, serves on numerous editorial boards of scientific journals, and has consulted widely on the commercial application of advanced data mining techniques. He is the author, with Casimir Kulikowski, of Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems, which is also available from Morgan Kaufmann Publishers.

Nitin Indurkhya

Nitin Indurkhya is on the faculty at the Basser Department of Computer Science, University of Sydney, Australia. He has published extensively on Data Mining and Machine Learning and has considerable experience with industrial data-mining applications in Australia, Japan and the USA.