Description

R and Data Mining introduces researchers, post-graduate students, and analysts to data mining using R, a free software environment for statistical computing and graphics. The book provides practical methods for using R in applications from academia to industry to extract knowledge from vast amounts of data. Readers will find this book a valuable guide to the use of R in tasks such as classification and prediction, clustering, outlier detection, association rules, sequence analysis, text mining, social network analysis, sentiment analysis, and more.

Data mining techniques are growing in popularity in a broad range of areas, from banking to insurance, retail, telecom, medicine, research, and government. This book focuses on the modeling phase of the data mining process, also addressing data exploration and model evaluation.

With three in-depth case studies, a quick reference guide, bibliography, and links to a wealth of online resources, R and Data Mining is a valuable, practical guide to a powerful method of analysis.

Key Features

  • Presents an introduction into using R for data mining applications, covering most popular data mining techniques
  • Provides code examples and data so that readers can easily learn the techniques
  • Features case studies in real-world applications to help readers apply the techniques in their work

Readership

Researchers in academia and industry working in the field of data mining, postgraduate students who are interested in data mining, as well as data miners and analysts from industry. Since data mining techniques are widely used in government agencies, banks, insurance, retail, telecom, medicine and research, the book will be interesting to many areas.

Table of Contents

Dedication

List of Figures

List of Abbreviations

Chapter 1. Introduction

1.1 Data Mining

1.2 R

1.3 Datasets

References

Chapter 2. Data Import and Export

2.1 Save and Load R Data

2.2 Import from and Export to .CSV Files

2.3 Import Data from SAS

2.4 Import/Export via ODBC

References

Chapter 3. Data Exploration

3.1 Have a Look at Data

3.2 Explore Individual Variables

3.3 Explore Multiple Variables

3.4 More Explorations

3.5 Save Charts into Files

References

Chapter 4. Decision Trees and Random Forest

4.1 Decision Trees with Package party

4.2 Decision Trees with Package rpart

4.3 Random Forest

References

Chapter 5. Regression

5.1 Linear Regression

5.2 Logistic Regression

5.3 Generalized Linear Regression

5.4 Non-Linear Regression

Chapter 6. Clustering

6.1 The k-Means Clustering

6.2 The k-Medoids Clustering

6.3 Hierarchical Clustering

6.4 Density-Based Clustering

References

Chapter 7. Outlier Detection

7.1 Univariate Outlier Detection

7.2 Outlier Detection with LOF

7.3 Outlier Detection by Clustering

7.4 Outlier Detection from Time Series

7.5 Discussions

References

Chapter 8. Time Series Analysis and Mining

8.1 Time Series Data in R

8.2 Time Series Decomposition

8.3 Time Series Forecasting

8.4 Time Series Clustering

8.5 Time Series Classification

8.6 Discussions

8.7 Further Readings

References

Chapter 9. Association Rules

9.1 Basics of Association Rules

9.2 The Titanic Dataset

9.3 Association Rule Mining

9.4 Removing Redundancy

9.5 Interpreting Rules

9.6 Visualizing Association Rules

9.7 Discussion

Details

No. of pages:
256
Language:
English
Copyright:
© 2012
Published:
Imprint:
Academic Press
eBook ISBN:
9780123972712
Print ISBN:
9780123969637

About the author

Yanchang Zhao

A Senior Data Mining Analyst in Australia Government since 2009. Before joining public sector, he was an Australian Postdoctoral Fellow (Industry) in the Faculty of Engineering & Information Technology at University of Technology, Sydney, Australia. His research interests include clustering, association rules, time series, outlier detection and data mining applications and he has over forty papers published in journals and conference proceedings. He is a member of the IEEE and a member of the Institute of Analytics Professionals of Australia, and served as program committee member for more than thirty international conferences.

Affiliations and Expertise

Senior Data Mining Specialist, Australia

Reviews

R and Data Mining introduces researchers, post-graduate students, and analysts to data mining using R, a free software environment for statistical computing and graphics. The book provides practical methods for using R in applications from academia to industry to extract knowledge from vast amounts of data. Readers will find this book a valuable guide to the use of R in tasks such as classification and prediction, clustering, outlier detection, association rules, sequence analysis, text mining, social network analysis, sentiment analysis, and more.

Data mining techniques are growing in popularity in a broad range of areas, from banking to insurance, retail, telecom, medicine, research, and government. This book focuses on the modeling phase of the data mining process, also addressing data exploration and model evaluation.

With three in-depth case studies, a quick reference guide, bibliography, and links to a wealth of online resources, R and Data Mining is a valuable, practical guide to a powerful method of analysis.