Data Mining Applications with R

Data Mining Applications with R

1st Edition - November 26, 2013
This is the Latest Edition
  • Authors: Yanchang Zhao, Yonghua Cen
  • eBook ISBN: 9780124115200
  • Hardcover ISBN: 9780124115118

Purchase options

Purchase options
DRM-free (EPub, Mobi, PDF)
Available
Sales tax will be calculated at check-out

Institutional Subscription

Free Global Shipping
No minimum order

Description

Data Mining Applications with R is a great resource for researchers and professionals to understand the wide use of R, a free software environment for statistical computing and graphics, in solving different problems in industry. R is widely used in leveraging data mining techniques across many different industries, including government, finance, insurance, medicine, scientific research and more. This book presents 15 different real-world case studies illustrating various techniques in rapidly growing areas. It is an ideal companion for data mining researchers in academia and industry looking for ways to turn this versatile software into a powerful analytic tool.   R code, Data and color figures for the book are provided at the RDataMining.com website.

Key Features

  • Helps data miners to learn to use R in their specific area of work and see how R can apply in different industries
  • Presents various case studies in real-world applications, which will help readers to apply the techniques in their work
  • Provides code examples and sample data for readers to easily learn the techniques by running the code by themselves

Readership

Researchers in academia and industry working in the field of data mining, postgraduate students who are interested in data mining, as well as data miners and analysts from industry. Government agencies, banks, insurance, retail, telecom, medicine and scientific research.

Table of Contents

  • Preface

    Background

    Objectives and Significance

    Target Audience

    Acknowledgments

    Review Committee

    Additional Reviewers

    Foreword

    References

    Chapter 1. Power Grid Data Analysis with R and Hadoop

    Abstract

    1.1 Introduction

    1.2 A Brief Overview of the Power Grid

    1.3 Introduction to MapReduce, Hadoop, and RHIPE

    1.4 Power Grid Analytical Approach

    1.5 Discussion and Conclusions

    Appendix

    References

    Chapter 2. Picturing Bayesian Classifiers: A Visual Data Mining Approach to Parameters Optimization

    Abstract

    Acknowledgments

    2.1 Introduction

    2.2 Related Works

    2.3 Motivations and Requirements

    2.4 Probabilistic Framework of NB Classifiers

    2.5 Two-Dimensional Visualization System

    2.6 A Case Study: Text Classification

    2.7 Conclusions

    References

    Chapter 3. Discovery of Emergent Issues and Controversies in Anthropology Using Text Mining, Topic Modeling, and Social Network Analysis of Microblog Content

    Abstract

    3.1 Introduction

    3.2 How Many Messages and How Many Twitter-Users in the Sample?

    3.3 Who Is Writing All These Twitter Messages?

    3.4 Who Are the Influential Twitter-Users in This Sample?

    3.5 What Is the Community Structure of These Twitter-Users?

    3.6 What Were Twitter-Users Writing About During the Meeting?

    3.7 What Do the Twitter Messages Reveal About the Opinions of Their Authors?

    3.8 What Can Be Discovered in the Less Frequently Used Words in the Sample?

    3.9 What Are the Topics That Can Be Algorithmically Discovered in This Sample?

    3.10 Conclusion

    References

    Chapter 4. Text Mining and Network Analysis of Digital Libraries in R

    Abstract

    4.1 Introduction

    4.2 Dataset Preparation

    4.3 Manipulating the Document-Term Matrix

    4.4 Clustering Content by Topics Using the LDA

    4.5 Using Similarity Between Documents to Explore Document Cohesion

    4.6 Social Network Analysis of Authors

    4.7 Conclusion

    References

    Chapter 5. Recommender Systems in R

    Abstract

    5.1 Introduction

    5.2 Business Case

    5.3 Evaluation

    5.4 Collaborative Filtering Methods

    5.5 Latent Factor Collaborative Filtering

    5.6 Simplified Approach

    5.7 Roll Your Own

    5.8 Final Thoughts

    References

    Chapter 6. Response Modeling in Direct Marketing: A Data Mining-Based Approach for Target Selection

    Abstract

    6.1 Introduction/Background

    6.2 Business Problem

    6.3 Proposed Response Model

    6.4 Modeling Detail

    6.5 Prediction Result

    6.6 Model Evaluation

    6.7 Conclusion

    References

    Chapter 7. Caravan Insurance Customer Profile Modeling with R

    Abstract

    7.1 Introduction

    7.2 Data Description and Initial Exploratory Data Analysis

    7.3 Classifier Models of Caravan Insurance Holders

    7.4 Discussion of Results and Conclusion

    Appendix A Details of the Full Data Set Variables

    Appendix B Customer Profile Data-Frequency of Binary Values

    Appendix C Proportion of Caravan Insurance Holders vis-à-vis other Customer Profile Variables

    Appendix D LR Model Details

    Appendix E R Commands for Computation of ROC Curves for Each Model Using Validation Dataset

    Appendix F Commands for Cross-Validation Analysis of Classifier Models

    References

    Chapter 8. Selecting Best Features for Predicting Bank Loan Default

    Abstract

    8.1 Introduction

    8.2 Business Problem

    8.3 Data Extraction

    8.4 Data Exploration and Preparation

    8.5 Missing Imputation

    8.6 Modeling

    8.7 Model Evaluation

    8.8 Finding and Model Deployment

    8.9 Lessons and Discussions

    Appendix Selecting Best Features for Predicting Bank Loan Default

    References

    Chapter 9. A Choquet Integral Toolbox and Its Application in Customer Preference Analysis

    Abstract

    9.1 Introduction

    9.2 Background

    9.3 Rfmtool Package

    9.4 Case Study

    9.5 Conclusions

    References

    Chapter 10. A Real-Time Property Value Index Based on Web Data

    Abstract

    Acknowledgments

    10.1 Introduction

    10.2 Housing Prices and Indices

    10.3 A Data Mining Approach

    10.4 Real Estate Pricing Models

    10.5 Conclusion

    References

    Chapter 11. Predicting Seabed Hardness Using Random Forest in R

    Abstract

    Acknowledgments

    11.1 Introduction

    11.2 Study Region and Data Processing

    11.3 Dataset Manipulation and Exploratory Analyses

    11.4 Application of RF for Predicting Seabed Hardness

    11.5 Model Validation Using rfcv

    11.6 Optimal Predictive Model

    11.7 Application of the Optimal Predictive Model

    11.8 Discussion and Conclusions

    Appendix AA Dataset of Seabed Hardness and 15 Predictors

    Appendix BA R Function, rf.cv, Shows the Cross-Validated Prediction Performance of a Predictive Model

    References

    Chapter 12. Supervised Classification of Images, Applied to Plankton Samples Using R and Zooimage

    Abstract

    Acknowledgments

    12.1 Background

    12.2 Challenges

    12.3 Data Extraction and Exploration

    12.4 Data Preprocessing

    12.5 Modeling

    12.6 Model Evaluation

    12.7 Model Deployment

    12.8 Lessons, Discussion, and Conclusions

    References

    Chapter 13. Crime Analyses Using R

    Abstract

    13.1 Introduction

    13.2 Problem Definition

    13.3 Data Extraction

    13.4 Data Exploration and Preprocessing

    13.5 Visualizations

    13.6 Modeling

    13.7 Model Evaluation

    13.8 Discussions and Improvements

    References

    Chapter 14. Football Mining with R

    Abstract

    Acknowledgments

    14.1 Introduction to the Case Study and Organization of the Analysis

    14.2 Background of the Analysis: The Italian Football Championship

    14.3 Data Extraction and Exploration

    14.4 Data Preprocessing

    14.5 Model Development: Building Classifiers

    14.6 Model Deployment

    14.7 Concluding Remarks

    References

    Chapter 15. Analyzing Internet DNS(SEC) Traffic with R for Resolving Platform Optimization

    Abstract

    15.1 Introduction

    15.2 Data Extraction from PCAP to CSV File

    15.3 Data Importation from CSV File to R

    15.4 Dimension Reduction Via PCA

    15.5 Initial Data Exploration Via Graphs

    15.6 Variables Scaling and Samples Selection

    15.7 Clustering for Segmenting the FQDN

    15.8 Building Routing Table Thanks to Clustering

    15.9 Building Routing Table Thanks to Mixed Integer Linear Programming

    15.10 Building Routing Table Via a Heuristic

    15.11 Final Evaluation

    15.12 Conclusion

    References

    Index

Product details

  • No. of pages: 514
  • Language: English
  • Copyright: © Academic Press 2013
  • Published: November 26, 2013
  • Imprint: Academic Press
  • eBook ISBN: 9780124115200
  • Hardcover ISBN: 9780124115118
  • About the Authors

    Yanchang Zhao

    A Senior Data Mining Analyst in Australia Government since 2009.

    Before joining public sector, he was an Australian Postdoctoral Fellow (Industry) in the Faculty of Engineering & Information Technology at University of Technology, Sydney, Australia. His research interests include clustering, association rules, time series, outlier detection and data mining applications and he has over forty papers published in journals and conference proceedings. He is a member of the IEEE and a member of the Institute of Analytics Professionals of Australia, and served as program committee member for more than thirty international conferences.

    Affiliations and Expertise

    Senior Data Mining Specialist, Australia

    Yonghua Cen

    Latest reviews

    (Total rating for all reviews)

    • CarlosRios M. Fri Apr 05 2019

      R is ok for data miners

      As a data mining beginner, I was looking for the right tool offering both variety and specificity. Data mining applications with R showed me how the various R techniques and methodologies can be used in a broad number of cases without losing efficiency and performance.