Are you a data mining analyst, who spends up to 80% of your time assuring data quality, then preparing that data for developing and deploying predictive models? And do you find lots of literature on data mining theory and concepts, but when it comes to practical advice on developing good mining views find little “how to” information? And are you, like most analysts, preparing the data in SAS?
This book is intended to fill this gap as your source of practical recipes. It introduces a framework for the process of data preparation for data mining, and presents the detailed implementation of each step in SAS. In addition, business applications of data mining modeling require you to deal with a large number of variables, typically hundreds if not thousands. Therefore, the book devotes several chapters to the methods of data transformation and variable selection.
- A complete framework for the data preparation process, including implementation details for each step.
- The complete SAS implementation code, which is readily usable by professional analysts and data miners.
- A unique and comprehensive approach for the treatment of missing values, optimal binning, and cardinality reduction.
- Assumes minimal proficiency in SAS and includes a quick-start chapter on writing SAS macros.
- CD includes dozens of SAS macros plus the sample data and the program for the book's case study.
Data Mining professionals, business analysts, SAS programmers, and data management and statistics students who plan to work in data mining. Essentially the same audience as all of our data mining books.
Contents 1 Introduction 1.1 The Data Mining Process 1.2 Methodologies of Data Mining 1.3 The Mining View 1.4 Scoring View 1.5 Notes on Data Mining Software 2 Tasks and Data Flow 2.1 Data Mining Tasks 2.2 Data Mining Competencies 2.3 The Data Flow 2.4 Types of Variables 2.5 The Mining View and the Scoring View 2.6 Steps of Data Preparation 3 Review of Data Mining Modeling Techniques 3.1 Introduction 3.2 Regression Models 3.3 Decision trees 3.4 Neural Networks 3.5 Cluster Analysis 3.6 Association Rules 3.7 Time Series Analysis 3.8 Support Vector Machines 4 SAS Macros: A Quick Start 4.1 Introduction: Why Macros 4.2 The Basics - The Macro and Its Variables 4.3 Doing Calculations 4.4 Programming Logic 4.5 Working with Strings 4.6 Macros that Call Other Macros 4.7 Common Macro Patterns and Caveats 4.8 Where to Go From Here 5 Data Acquisition and Integration 5.1 Introduction 5.2 Sources of Data 5.3 Variable Types 5.4 Data Roll Up 5.5 Roll Up With Sums, Averages and Counts 5.6 Calculation of the Mode 5.7 Data Integration 6 Integrity Checks 6.1 Introduction 6.2 Comparing Datasets 6.3 Dataset Schema Checks 6.3.2 Variable Types 6.4 Nominal Variables 6.5 Continuous Variables 7 Exploratory Data Analysis 7.1 Introduction 7.2 Common EDA Procedures 7.3 Univariate Statistics 7.4 Varia
- No. of pages:
- © Morgan Kaufmann 2007
- 29th September 2006
- Morgan Kaufmann
- eBook ISBN:
- Paperback ISBN:
It is easy to write books that address broad topics and ideas leaving the reader with the question “Yes, but how?” By combining a comprehensive guide to data preparation for data mining along with specific examples in SAS, Mamdouh's book is a rare find—a blend of theory and the practical at the same time. As anyone who has mined data will confess, 80% of the problem is in data preparation; Mamdouh addresses this difficult subject with strong practical techniques and methods. If you are working on an SAS data mining project, this book is a must! If you are working on any data mining project, the techniques and methods will be a guiding light! --Frank Byrum, Cormine Intelligent Data, LLC