A Practical Guide To order this title, and for more information, click here
By Sholom Weiss Nitin Indurkhya
Description The potential business advantages of data mining are well documented in publications for executives and managers. However, developers
implementing major data-mining systems need concrete information about the underlying technical principles?and their practical manifestations?in
order to either integrate commercially available tools or write data-mining programs from scratch. This book is the first technical
guide to provide a complete, generalized roadmap for developing data-mining applications, together with advice on performing these large-scale,
open-ended analyses for real-world data warehouses.
Note: If you already own Predictive Data Mining: A Practical Guide, please see
ISBN 1-55860-477-4 to order the accompanying software. To order the book/software package, please see ISBN 1-55860-478-2.
Contents Preface
1 What is Data Mining?
1.1 Big Data
1.1.1 The Data Warehouse
1.1.2 Timelines
1.2 Types of Data-Mining Problems
1.3 The Pedigree
of Data Mining
1.3.1 Databases
1.3.2 Statistics
1.3.3 Machine Learning
1.4 Is Big Better?
1.4.1 Strong Statistical Evaluation
1.4.2
More Intensive Search
1.4.3 More Controlled Experiments
1.4.4 Is Big Necessary
1.5 The Tasks of Predictive Data Mining
1.5.1 Data
Preparation
1.5.2 Data Reduction
1.5.3 Data Modeling and Prediction
1.5.4 Case and Solution Analyses
1.6 Data Mining: Art or Science
1.7 An Overview of the Book
1.8 Bibliographic and Historical Remarks
2 Statistical Evaluation for Big Data
2.1 The Idealized Model
2.1.1 Classical Statistical Comparison and Evaluation
2.2 It's Big but Is It Biased
2.2.1 Objective Versus Survey Data
2.2.2 Significance
and Predictive Value
2.2.2.1 Too Many Comparisons?
2.3 Classical Types of Statistical Prediction
2.3.1 Predicting True-or-False: Classification
2.3.1.1 Error Rates
2.3.2 Forecasting Numbers: Regression
2.3.2.1 Distance Measures
2.4 Measuring Predictive Performance
2.4.1 Independent
Testing
2.4.1.1 Random Training and Testing
2.4.1.2 How Accurate Is the Error Estimate?
2.4.1.3 Comparing Results for Error Measures
2.4.1.4 Ideal or Real-World Sampling?
2.4.1.5 Training and Testing from Different Time Periods
2.5 Too Much Searching and Testing?
2.6 Why Are Errors Made?
2.7 Bibliographic and Historical Remarks
3 Preparing the Data
3.1 A Standard Form
3.1.1 Standard Measurements
3.1.2 Goals
3.2 Data Transformations
3.2.1 Normalizations
3.2.2 Data Smoothing
3.2.3 Differences and Ratios
3.3 Missing Data
3.4
Time-Dependent Data
3.4.1 Time Series
3.4.2 Composing Features from Time Series
3.4.2.1 Current Values
3.4.2.2 Moving Averages
3.4.2.3
Trends
3.4.2.4 Seasonal Adjustments
3.5 Hybrid Time-Dependent Applications
3.5.1 Multivariate Time Series
3.5.2 Classification and
Time Series
3.5.3 Standard Cases and Time-Series Attributes
3.6 Text Mining
3.7 Bibliographic and Historical Remarks
4 Data Reduction
4.1 Selecting the Best Features
4.2 Feature Selection from Means and Variances
4.2.1 Independent Features
4.2.2 Distance-Based Optimal
Feature Selection
4.2.3 Heuristic Feature Selection
4.3 Principal Constraints
4.4 Feature Selection by Decision Trees
4.5 How Many
Measured Values
4.5.1 Reducing and Smoothing Values
4.5.1.1 Rounding
4.5.1.2 K-Means Clustering
4.5.1.3 Class Entropy
4.6 How Many
Cases?
4.6.1 A Single Sample
4.6.2 Incremental Samples
4.6.3 Average Samples
4.6.4 Specialized Case-Reduction Techniques
4.6.4.1
Sequential Sampling over Time
4.6.4.2 Strategic Sampling of Key Events
4.6.4.3 Adjusting Prevalence
4.7 Bibliographic and Historical
Remarks
5 Looking for Solutions
5.1 Overview
5.2 Math Solutions
5.2.1 Linear Scoring
5.2.2 Nonlinear Scoring: Neural Nets
5.2.3
Advanced Statistical Methods
5.3 Distance Solutions
5.4 Logic Solutions
5.4.1 Decision Trees
5.4.2 Decision Rules
5.5 What Do the
Answers Mean?
5.5.1 Is It Safe to Edit Solutions?
5.6 Which Solution is Preferable?
5.7 Combining Different Answers
5.7.1 Multiple
Prediction Methods
5.7.2 Multiple Samples
5.8 Bibliographic and Historical Remarks
6 What's Best for Data Reduction and Mining?
6.1
Let's Analyze Some Real Data
6.2 The Experimental Methods
6.3 The Empirical Results
6.3.1 Significance Testing
6.4 So What Did We
Learn?
6.4.1 Feature Selection
6.4.2 Value Reduction
6.4.3 Subsampling or All Cases
6.5 Graphical Trend Analysis
6.5.1 Incremental
Case Analysis
6.5.2 Incremental Complexity Analysis
6.6 Maximum Data Reduction
6.7 Are There Winners and Losers in Performance?
6.8
Getting the Best Results
6.9 Bibliogaphic and Historical Remarks
7 Art or Science? Case Studies in Data Mining
7.1 Why These Case
Studies?
7.2 A Summary of Tasks for Predictive Data Mining
7.2.1 A Checklist for Data Preparation
7.2.2 A Checklist for Data Reduction
7.2.3 A Checklist for Data Modeling and Prediction
7.2.4 A Checklist for Case and Solution Analyses
7.3 The Case Studies
7.3.1 Transaction
Processing
7.3.2 Text Mining
7.3.3 Outcomes Analysis
7.3.4 Process Control
7.3.5 Marketing and User Profiling
7.3.6 Exploratory
Analysis
7.4 Looking Ahead
7.5 Bibliographic and Historical Remarks
Appendix: Data-Miner Software Kit
Bibliographic & ordering Information Paperback, 228 pages, publication date: AUG-1997
ISBN-13: 978-1-55860-403-2
ISBN-10: 1-55860-403-0
Imprint: MORGAN KAUFFMAN Price:Order form EUR 50.95 USD 62.95 GBP 35.99
Books and book related electronic products are priced in US dollars (USD), euro (EUR), and Great Britain Pounds (GBP). USD prices apply to the Americas and Asia Pacific. EUR prices apply in Europe and the Middle East. GBP prices apply to the UK and all other countries.