Java Data Mining: Strategy, Standard, and Practice

A Practical Guide for architecture, design, and implementation


  • Mark Hornick, Sr. Manager, Data Mining Technologies, Oracle Corporation, Burlington, MA
  • Erik Marcadé, Founder and Chief Technical Officer, KXEN, Paris, France
  • Sunil Venkayala, Principal Member of Technical Staff, Oracle, Burlington, MA

Whether you are a software developer, systems architect, data analyst, or business analyst, if you want to take advantage of data mining in the development of advanced analytic applications, Java Data Mining, JDM, the new standard now implemented in core DBMS and data mining/analysis software, is a key solution component. This book is the essential guide to the usage of the JDM standard interface, written by contributors to the JDM standard. The book discusses and illustrates how to solve real problems using the JDM API. The authors provide you with:* Data mining introduction—an overview of data mining and the problems it can address across industries; JDM’s place in strategic solutions to data mining-related problems;* JDM essentials—concepts, design approach and design issues, with detailed code examples in Java; a Web Services interface to enable JDM functionality in an SOA environment; and illustration of JDM XML Schema for JDM objects; * JDM in practice—the use of JDM from vendor implementations and approaches to customer applications, integration, and usage; impact of data mining on IT infrastructure; a how-to guide for building applications that use the JDM API. * Free, downloadable KJDM source code referenced in the book available here
View full description


This book is for software developers and applications architects interested in or who need data mining analysis as part of their application. It can be used by both novice and advanced java developers as a reference for incorporating data mining into applications, leveraging the sample code provided. For example, a Java developer may know he wants to classify a customer's interest in a product, but doesn't know how to get started. This book provides a quick start for using data mining in a practical context. On the other hand, experienced data miners who use Java will also gain benefits by seeing working code of how to use JSM to accomplish mining task.


Book information

  • Published: November 2006
  • ISBN: 978-0-12-370452-8


"This is not only a great introduction to JDM, but also a great introduction for a practitioner to data mining in general. This is a “must-have" for anyone developing large-scale data mining applications in Java." --Robert Grossman, Open Data Group and University of Illinois at Chicago "It pleases me that the Java Community ProcessSM(JCPSM) Program could host the development of the Data Mining standard, JSR 73, whose evolution and usability are presented so compellingly in Java Data Mining: Standard, Strategy, and Practice. The authors have taken a unique approach to describing a broad range of aspects from strategies to problem solving with data mining technology in a variety of industries. The book is a ”must-read” for those who want to introduce themselves to Java data mining (JDM) and fully realize the strategic importance of this technology in an ever competitive environment." —-Onno Kluyt, senior director, JCP Program at Sun Microsystems, Inc., and chair of the JCP "Java is now ubiquitous and over the past few years the Java world has shifted focus on--among other things--new frameworks, such as the Java Data Mining (JDM) framework. JDM addresses a clear need for standardization in data mining operations, yet to those approaching both Java and data mining the mountain seems as Everest. Hornick, Marcadé, and Venkayala could not have written this book at a better time. To the expert it is reference and map of the landscape, and to the novice it will be a constant guide and companion to each journey in JDM. This book is approachable, usable, practical, and necessary for any Java data mining software architect, developer, or analyst." –Frank Byrum, Chief Scientist, CorMine Intelligent Data, LLC

Table of Contents

Preface Guide to Readers Part I - Strategy 1. Overview of Data Mining 1.1. Why is data mining relevant today? 1.2. Introducing Data Mining 1.3. The Value of Data Mining 1.4. Summary 1.5. References 2. Solving Problems in Industry 2.1. Cross-industry data mining solutions 2.2. Data Mining in Industries 2.3. Summary 2.4. References 3. Data Mining Process 3.1. A standardized data mining process 3.2. Data Analysis and Preparation…a more detailed view 3.3. Data mining modeling, analysis, and scoring processes 3.4. The Role of databases and data warehouses in Data Mining 3.5. Data mining in enterprise software architectures 3.6. Advances in automated data mining 3.7. Summary 3.8. References 4. Mining Functions and Algorithms 4.1. Data mining functions 4.2. Classification 4.3. Regression 4.4. Attribute Importance 4.5. Association 4.6. Clustering 4.7. Summary 4.8. References 5. JDM Strategy 5.1. What is the JDM strategy? 5.2. Role of Standards 5.3. Summary 5.4. References 6. Getting Started 6.1. Business Understanding 6.2. Data Understanding 6.3. Data Preparation 6.4. Modeling 6.5. Evaluation 6.6. Deployment 6.7. Summary 6.8. References Part II - Standard 7. Java Data Mining Concepts 7.1. Classification problem 7.2. Regression problem 7.3. Attribute importance 7.4. Association rules problem 7.5. Clustering problem 7.6. Summary 7.7. References 8. Design of the JDM API 8.1. Object Modeling of Data Mining Concepts 8.2. Modular Packages 8.3. Connection Architecture 8.4. Object Factories 8.5. URI for Datasets 8.6. Enumerated Types 8.7. Exceptions 8.8. Discovering DME Capabilities 8.9. Summary 8.10. References 9. Using the JDM API 9.1. Connection Interfaces 9.2. Using JDM Enumerations 9.3. Using data specification interfaces 9.4. Using classification interfaces 9.5. Using Regression interfaces 9.6. Using Attribute Importance interfaces 9.7. Using Association interfaces 9.8. Using Clustering interfaces 9.9. Summary 9.10. References 10. XML Schema 10.1. Overview 10.2. Schema Elements 10.3. Schema Types 10.4. Using PMML with the JDM Schema 10.5. Use cases for JDM XML Schema and Documents10.6. Summary 10.7. References 11. Web Services 11.1. What is a Web Service? 11.2. Service Oriented Architecture (SOA) 11.3. JDM Web Service (JDMWS) 11.4. Enabling JDM Web Services using JAX-RPC 11.5. Summary 11.6. References Part III - Practice 12. Practical Problem Solving 12.1. Business Scenario 1: Targeted Marketing Campaign 12.2. Business Scenario 2: Understanding Key Factors 12.3. Business Scenario 3: Using Customer Segmentation 12.4. Summary 12.5. Bibliography 13. Building Data Mining Tools using JDM 13.1. Data mining tools 13.2. Administrative Console 13.3. User Interface to build and save a model13.4. User Interface to test model quality 13.5. Summary 14. Getting Started with JDM Web Services 14.1. A Web Service client in PhP 14.2. A Web Service client in Java 14.3. Summary 14.4. References 15. Impacts on IT Infrastructure 15.1. What does Data Mining require from IT? 15.2. Impacts on computing hardware 15.3. Impacts on data storage hardware 15.4. Data access 15.5. Backup and recovery 15.6. Scheduling 15.7. Workflow 15.8. Summary 15.9. References 16. Vendor implementations 16.1. Oracle Data Mining 16.2. KXEN (Knowledge eXtraction ENgines) 16.3. Process for new Vendors 16.4. Process for new JDM users 16.5. Summary 16.6. References Part IV. Wrapping Up 17. Evolution of Data Mining Standards 17.1. Data Mining Standards 17.2. Java Community Process 17.3. Why so many standards? 17.4. Where data mining standards have been and where will they go? 17.5. Directions for data mining standards 17.6. Summary 17.7. References 18. Preview of Java Data Mining 2.0 18.1. Transformations 18.2. Time Series 18.3. Apply for Association 18.4. Feature Extraction 18.5. Statistics 18.6. Multi-target Models 18.7. Text Mining 18.8. Summary 18.9. References 19. Summary App. A. Further Reading App. B. Glossary