Java Data Mining: Strategy, Standard, and Practice

A Practical Guide for Architecture, Design, and Implementation

1st Edition - November 7, 2006
Authors: Mark F. Hornick, Erik Marcadé, Sunil Venkayala
Language: English
eBook ISBN:
9 7 8 - 0 - 0 8 - 0 4 9 5 9 1 - 0

Whether you are a software developer, systems architect, data analyst, or business analyst, if you want to take advantage of data mining in the development of advanced analytic… Read more

Java Data Mining: Strategy, Standard, and Practice

Purchase options

LIMITED OFFER

Save 50% on book bundles

Immediately download your ebook while waiting for your print delivery. No promo code is needed.

Institutional subscription on ScienceDirect

Request a sales quote

Resources

Textbook support for instructors(opens in new tab/window)

Whether you are a software developer, systems architect, data analyst, or business analyst, if you want to take advantage of data mining in the development of advanced analytic applications, Java Data Mining, JDM, the new standard now implemented in core DBMS and data mining/analysis software, is a key solution component. This book is the essential guide to the usage of the JDM standard interface, written by contributors to the JDM standard.

Preface
Guide to Readers

Part I - Strategy

1. Overview of Data Mining

1.1. Why is data mining relevant today?

1.2. Introducing Data Mining

1.3. The Value of Data Mining

1.4. Summary

1.5. References

2. Solving Problems in Industry

2.1. Cross-industry data mining solutions

2.2. Data Mining in Industries

2.3. Summary

2.4. References

3. Data Mining Process

3.1. A standardized data mining process

3.2. Data Analysis and Preparation…a more detailed view

3.3. Data mining modeling, analysis, and scoring processes

3.4. The Role of databases and data warehouses in Data Mining

3.5. Data mining in enterprise software architectures

3.6. Advances in automated data mining

3.7. Summary

3.8. References

4. Mining Functions and Algorithms

4.1. Data mining functions

4.2. Classification

4.3. Regression

4.4. Attribute Importance

4.5. Association

4.6. Clustering

4.7. Summary

4.8. References

5. JDM Strategy

5.1. What is the JDM strategy?

5.2. Role of Standards

5.3. Summary

5.4. References

6. Getting Started

6.1. Business Understanding

6.2. Data Understanding

6.3. Data Preparation

6.4. Modeling

6.5. Evaluation

6.6. Deployment

6.7. Summary

6.8. References

Part II - Standard

7. Java Data Mining Concepts

7.1. Classification problem

7.2. Regression problem

7.3. Attribute importance

7.4. Association rules problem

7.5. Clustering problem

7.6. Summary

7.7. References

8. Design of the JDM API

8.1. Object Modeling of Data Mining Concepts

8.2. Modular Packages

8.3. Connection Architecture

8.4. Object Factories

8.5. URI for Datasets

8.6. Enumerated Types

8.7. Exceptions

8.8. Discovering DME Capabilities

8.9. Summary

8.10. References

9. Using the JDM API

9.1. Connection Interfaces

9.2. Using JDM Enumerations

9.3. Using data specification interfaces

9.4. Using classification interfaces

9.5. Using Regression interfaces

9.6. Using Attribute Importance interfaces

9.7. Using Association interfaces

9.8. Using Clustering interfaces

9.9. Summary

9.10. References

10. XML Schema

10.1. Overview

10.2. Schema Elements

10.3. Schema Types

10.4. Using PMML with the JDM Schema

10.5. Use cases for JDM XML Schema and Documents

10.6. Summary

10.7. References

11. Web Services

11.1. What is a Web Service?

11.2. Service Oriented Architecture (SOA)

11.3. JDM Web Service (JDMWS)

11.4. Enabling JDM Web Services using JAX-RPC

11.5. Summary

11.6. References

Part III - Practice

12. Practical Problem Solving

12.1. Business Scenario 1: Targeted Marketing Campaign

12.2. Business Scenario 2: Understanding Key Factors

12.3. Business Scenario 3: Using Customer Segmentation

12.4. Summary

12.5. Bibliography

13. Building Data Mining Tools using JDM

13.1. Data mining tools

13.2. Administrative Console

13.3. User Interface to build and save a model

13.4. User Interface to test model quality

13.5. Summary

14. Getting Started with JDM Web Services

14.1. A Web Service client in PhP

14.2. A Web Service client in Java

14.3. Summary

14.4. References

15. Impacts on IT Infrastructure

15.1. What does Data Mining require from IT?

15.2. Impacts on computing hardware

15.3. Impacts on data storage hardware

15.4. Data access

15.5. Backup and recovery

15.6. Scheduling

15.7. Workflow

15.8. Summary

15.9. References

16. Vendor implementations

16.1. Oracle Data Mining

16.2. KXEN (Knowledge eXtraction ENgines)

16.3. Process for new Vendors

16.4. Process for new JDM users

16.5. Summary

16.6. References

Part IV. Wrapping Up

17. Evolution of Data Mining Standards

17.1. Data Mining Standards

17.2. Java Community Process

17.3. Why so many standards?

17.4. Where data mining standards have been and where will they go?

17.5. Directions for data mining standards

17.6. Summary

17.7. References

18. Preview of Java Data Mining 2.0

18.1. Transformations

18.2. Time Series

18.3. Apply for Association

18.4. Feature Extraction

18.5. Statistics

18.6. Multi-target Models

18.7. Text Mining

18.8. Summary

18.9. References

19. Summary

App. A. Further Reading
App. B. Glossary

Mark F. Hornick

Mark Hornick has lead the Java Data Mining (JSR-73) expert group since its inception in July of 2000, and now leads the JSR-247 expert group working towards JDM 2.0. Mr. Hornick brings nearly 20 years experience in the design and implementation of advanced distributed systems, including in-database data mining, distributed object management, and Java APIs. Mr. Hornick is a senior manager in Oracle’s Data Mining Technologies group.

Mr. Hornick joined Oracle through Oracle’s acquisition of Thinking Machines Corporation in 1999. Prior to Thinking Machines, where he served as architect for TMC’s next generation data mining software, Mr. Hornick was a Principal Investigator at GTE Laboratories, involved in advanced telecommunications network management software, distributed transaction management research, and distributed object management research.

Mr. Hornick has contributed to several other data mining standards, including the Data Mining Group’s PMML, ISO SQL/MM for Data Mining, and the Object Management Group’s Common Warehouse Metadata. He has given talks at the International Conference on Knowledge Discovery and Databases, JavaOne, JavaPro Live!, and The ServerSide Symposium on data mining standards and JDM. He has also published various papers and articles over his career.

Mr. Hornick holds a bachelor degree from Rutgers University in Computer Science, and a masters degree from Brown University, also in Computer science where he specialized in distributed object databases.

Affiliations and expertise

Sr. Manager, Data Mining Technologies, Oracle Corporation, Burlington, MA

Erik Marcadé

With over 17 years of experience in the neural network industry, Erik Marcade, founder and chief technical officer for KXEN, is responsible for software development and information technologies. Prior to founding KXEN, Mr. Marcade developed real-time software expertise at Cadence Design Systems, accountable for advancing real-time software systems as well as managing “system-on-a-chip” projects. Before joining Cadence, Mr. Marcade spearheaded a project to restructure the marketing database of the largest French automobile manufacturer for Atos, a leading European information technology services company.

In 1990, Mr. Marcade co-founded Mimetics, a French company that processes and sells development environment, optical character recognition (OCR) products and services using neural network technology.

Prior to Mimetics, Mr. Marcade joined Thomson-CSF Weapon System Division as a software engineer and project manager working on the application of artificial intelligence for projects in weapons allocation, target detection and tracking, geo-strategic assessment, and software quality control. He contributed to the creation of Thomson Research Laboratories in Palo Alto, CA (Pacific Rim Operation-PRO) as senior software engineer. There he collaborated with Stanford University on the automatic landing and flare system for Boeing, and Kestrel Institute, a non-profit computer science research organization. He returned to France to head Esprit projects on neural networks development.

Mr. Marcade holds an engineering degree from Ecole de l’Aeronautique et de l’Espace, specializing in process control, signal processing, computer science, and artificial intelligence

Affiliations and expertise

Founder and Chief Technical Officer, KXEN, Paris, France

Sunil Venkayala

J2EE and XML group leader and Principal Member of Technical Staff at Oracle Data Mining Technologies. Expert group member of Java Data Mining (JDM) standard developed under JSR-73. More than five years experience in developing applications using predictive technologies available in the Oracle Database. More than seven years of experience in working with Java and Internet technologies. Authored JDM article in Java Developer Journal. Holds a B.S in Engineering and Masters in Industrial Management from Indian Institute Of Technology, Kanpur.

Affiliations and expertise

Principal Member of Technical Staff, Oracle, Burlington, MA