To order this title, and for more information, click here
By Mamdouh Refaat, Consultant
Description Are you a data mining analyst, who spends up to 80% of your time assuring data quality, then preparing that data for developing and deploying
predictive models? And do you find lots of literature on data mining theory and concepts, but when it comes to practical advice on developing
good mining views find little ?how to? information? And are you, like most analysts, preparing the data in SAS?
This book is intended
to fill this gap as your source of practical recipes. It introduces a framework for the process of data preparation for data mining,
and presents the detailed implementation of each step in SAS. In addition, business applications of data mining modeling require you
to deal with a large number of variables, typically hundreds if not thousands. Therefore, the book devotes several chapters to the methods
of data transformation and variable selection.
Audience
Data Mining professionals, business analysts, SAS programmers, and data management and statistics students who plan to work in data mining.
Essentially the same audience as all of our data mining books.
Contents Contents
1 Introduction
1.1 The Data Mining Process
1.2 Methodologies of Data Mining
1.3 The Mining View
1.4 Scoring View
1.5 Notes
on Data Mining Software
2 Tasks and Data Flow
2.1 Data Mining Tasks
2.2 Data Mining Competencies
2.3 The Data Flow
2.4 Types of
Variables
2.5 The Mining View and the Scoring View
2.6 Steps of Data Preparation
3 Review of Data Mining Modeling Techniques
3.1 Introduction
3.2 Regression Models
3.3 Decision trees
3.4 Neural Networks
3.5 Cluster Analysis
3.6 Association Rules
3.7 Time Series Analysis
3.8 Support Vector Machines
4 SAS Macros: A Quick Start
4.1 Introduction: Why Macros
4.2 The Basics - The Macro and Its Variables
4.3 Doing Calculations
4.4 Programming Logic
4.5 Working with Strings
4.6 Macros that Call Other Macros
4.7 Common Macro Patterns
and Caveats
4.8 Where to Go From Here
5 Data Acquisition and Integration
5.1 Introduction
5.2 Sources of Data
5.3 Variable Types
5.4 Data Roll Up
5.5 Roll Up With Sums, Averages and Counts
5.6 Calculation of the Mode
5.7 Data Integration
6 Integrity Checks
6.1 Introduction
6.2 Comparing Datasets
6.3 Dataset Schema Checks
6.3.2 Variable Types
6.4 Nominal Variables
6.5 Continuous Variables
7 Exploratory Data Analysis
7.1 Introduction
7.2 Common EDA Procedures
7.3 Univariate Statistics
7.4 Variable Distribution
7.5
Detection of Outliers
7.5.4 Notes on Outliers
7.6 Testing Normality
7.7 Cross-tabulation
7.8 Investigating Data Structures
8 Sampling
and Partitioning
8.1 Introduction
8.2 Contents of Samples
8.3 Random Sampling
8.4 Balanced Sampling
8.5 Minimum Sample Size
9 Data
Transformations
9.1 Raw and Analytical Variables
9.2 Scope of Data Transformations
9.3 Creation of New Variables
9.4 Mapping of Nominal
Variables
9.5 Normalization of Continuous Variables
9.6 Changing the Variable Distribution
10 Binning and Reduction of Cardinality
10.1 Introduction
10.2 Cardinality Reduction
10.2.1 The Main Questions
10.2.2 Structured Grouping Methods
10.2.3 Splitting a Dataset
10.2.4 The Main Algorithm
10.2.5 Reduction of Cardinality Using Gini Measure
10.2.6 Limitations and Modifications
10.3 Binning of
Continuous Variables
11 Treatment of Missing Values
11.1 Introduction
11.2 Simple Replacement
11.3 Imputing Missing Values
11.3.1
Basic Issues in Multiple Imputation
11.3.2 Patterns of Missingness
11.4 Imputation Methods and Strategy
11.5 SAS Macros for Multiple
Imputation
Nominal Variables
11.6 Predicting Missing Values
12 Predictive Power and Variable Reduction I
12.1 Introduction
12.2 Metrics
of Predictive Power .
12.3 Methods of Variable Reduction
12.4 Variable Reduction : before or during modeling
13 Analysis of Nominal
and Ordinal Variables
13.1 Introduction
13.2 Contingency Tables
13.3 Notation and Definitions
13.4 Contingency Tables for Binary
Variables
13.5 Contingency Tables for Multi - Category Variables
13.6 Analysis of Ordinal Variables
13.7 Implementation Scenarios
14 Analysis of Continuous Variables
14.1 Introduction
14.2 When is Binning Necessary?
14.3 Measures of Association
14.4 Correlation
Coefficients
15 Principal Component Analysis (PCA) 2
15.1 Introduction 15.2 Mathematical Formulations
15.3 Implementing and Using PCA
.
15.4 Comments on Using PCA
15.4.1 Number of Principal Components
15.4.2 Success of PCA
15.4.3 Nominal Variables
15.4.4 Dataset
Size and Performance
16 Factor Analysis
16.1 Introduction to Factor Analysis
16.2 Relationship between PCA and FA
16.3 Implementation
of Factor Analysis
17 Predictive Power and Variable Reduction II
17.1 Introduction
17.2 Data with Binary Dependent Variables
17.3
Nominal IV?s 17.3.2 Ordinal IV?s
17.4 Variable Reduction Strategies
18 Putting it All Together
18.1 Introduction
18.2 The Process
of Data Preparation
18.3 Case Study: The Bookstore
A Listing of SAS Macros
A.1 Copyright and Software License
A.2 Dependencies between
Macros
A.3 Data Acquisition and Integration
A.4 Integrity Checks
A.5 Exploratory Data Analysis
A.6 Sampling and Partitioning
A.7
Data Transformations
A.8 Binning and Reduction of Cardinality
A.9 Treatment of Missing Values
A.10 Analysis of Nominal and Ordinal
Variables
A.11 Analysis of Continuous Variables
A.12 Principal Component Analysis
Books and book related electronic products are priced in US dollars (USD), euro (EUR), and Great Britain Pounds (GBP). USD prices apply to the Americas and Asia Pacific. EUR prices apply in Europe and the Middle East. GBP prices apply to the UK and all other countries.