Entity Information Life Cycle for Big Data - 1st Edition - ISBN: 9780128005378, 9780128006658

Entity Information Life Cycle for Big Data

1st Edition

Master Data Management and Information Integration

Authors: John R. Talburt Yinle Zhou
eBook ISBN: 9780128006658
Paperback ISBN: 9780128005378
Imprint: Morgan Kaufmann
Published Date: 23rd April 2015
Page Count: 254
Tax/VAT will be calculated at check-out Price includes VAT (GST)
20% off
20% off
20% off
46.95
37.56
39.99
31.99
64.95
51.96
Unavailable
Price includes VAT (GST)
DRM-Free

Easy - Download and start reading immediately. There’s no activation process to access eBooks; all eBooks are fully searchable, and enabled for copying, pasting, and printing.

Flexible - Read on multiple operating systems and devices. Easily read eBooks on smart phones, computers, or any eBook readers, including Kindle.

Open - Buy once, receive and download all available eBook formats, including PDF, EPUB, and Mobi (for Kindle).

Institutional Access

Secure Checkout

Personal information is secured with SSL technology.

Free Shipping

Free global shipping
No minimum order.

Description

Entity Information Life Cycle for Big Data walks you through the ins and outs of managing entity information so you can successfully achieve master data management (MDM) in the era of big data. This book explains big data’s impact on MDM and the critical role of entity information management system (EIMS) in successful MDM. Expert authors Dr. John R. Talburt and Dr. Yinle Zhou provide a thorough background in the principles of managing the entity information life cycle and provide practical tips and techniques for implementing an EIMS, strategies for exploiting distributed processing to handle big data for EIMS, and examples from real applications. Additional material on the theory of EIIM and methods for assessing and evaluating EIMS performance also make this book appropriate for use as a textbook in courses on entity and identity management, data management, customer relationship management (CRM), and related topics.

Key Features

  • Explains the business value and impact of entity information management system (EIMS) and directly addresses the problem of EIMS design and operation, a critical issue organizations face when implementing MDM systems
  • Offers practical guidance to help you design and build an EIM system that will successfully handle big data
  • Details how to measure and evaluate entity integrity in MDM systems and explains the principles and processes that comprise EIM
  • Provides an understanding of features and functions an EIM system should have that will assist in evaluating commercial EIM systems
  • Includes chapter review questions, exercises, tips, and free downloads of demonstrations that use the OYSTER open source EIM system
  • Executable code (Java .jar files), control scripts, and synthetic input data illustrate various aspects of CSRUD life cycle such as identity capture, identity update, and assertions

Readership

IT managers and software developers as well as graduate and undergraduate students in computer science, information science, business computer information systems, and management information systems.

Table of Contents

  • Foreword
  • Preface
  • Acknowledgements
  • Chapter 1. The Value Proposition for MDM and Big Data
    • Definition and Components of MDM
    • The Business Case for MDM
    • Dimensions of MDM
    • The Challenge of Big Data
    • MDM and Big Data – The N-Squared Problem
    • Concluding Remarks
  • Chapter 2. Entity Identity Information and the CSRUD Life Cycle Model
    • Entities and Entity References
    • Managing Entity Identity Information
    • Entity Identity Information Life Cycle Management Models
    • Concluding Remarks
  • Chapter 3. A Deep Dive into the Capture Phase
    • An Overview of the Capture Phase
    • Building the Foundation
    • Understanding the Data
    • Data Preparation
    • Selecting Identity Attributes
    • Assessing ER Results
    • Data Matching Strategies
    • Concluding Remarks
  • Chapter 4. Store and Share – Entity Identity Structures
    • Entity Identity Information Management Strategies
    • Dedicated MDM Systems
    • The Identity Knowledge Base
    • MDM Architectures
    • Concluding Remarks
  • Chapter 5. Update and Dispose Phases – Ongoing Data Stewardship
    • Data Stewardship
    • The Automated Update Process
    • The Manual Update Process
    • Asserted Resolution
    • EIS Visualization Tools
    • Managing Entity Identifiers
    • Concluding Remarks
  • Chapter 6. Resolve and Retrieve Phase – Identity Resolution
    • Identity Resolution
    • Identity Resolution Access Modes
    • Confidence Scores
    • Concluding Remarks
  • Chapter 7. Theoretical Foundations
    • The Fellegi-Sunter Theory of Record Linkage
    • The Stanford Entity Resolution Framework
    • Entity Identity Information Management
    • Concluding Remarks
  • Chapter 8. The Nuts and Bolts of Entity Resolution
    • The ER Checklist
    • Cluster-to-Cluster Classification
    • Selecting an Appropriate Algorithm
    • Concluding Remarks
  • Chapter 9. Blocking
    • Blocking
    • Blocking by Match Key
    • Dynamic Blocking versus Preresolution Blocking
    • Blocking Precision and Recall
    • Match Key Blocking for Boolean Rules
    • Match Key Blocking for Scoring Rules
    • Concluding Remarks
  • Chapter 10. CSRUD for Big Data
    • Large-Scale ER for MDM
    • The Transitive Closure Problem
    • Distributed, Multiple-Index, Record-Based Resolution
    • An Iterative, Nonrecursive Algorithm for Transitive Closure
    • Iteration Phase: Successive Closure by Reference Identifier
    • Deduplication Phase: Final Output of Components
    • ER Using the Null Rule
    • The Capture Phase and IKB
    • The Identity Update Problem
    • Persistent Entity Identifiers
    • The Large Component and Big Entity Problems
    • Identity Capture and Update for Attribute-Based Resolution
    • Concluding Remarks
  • Chapter 11. ISO Data Quality Standards for Master Data
    • Background
    • Goals and Scope of the ISO 8000-110 Standard
    • Four Major Components of the ISO 8000-110 Standard
    • Simple and Strong Compliance with ISO 8000-110
    • ISO 22745 Industrial Systems and Integration
    • Beyond ISO 8000-110
    • Concluding Remarks
  • Appendix A. Some Commonly Used ER Comparators
  • References
  • Index

Details

No. of pages:
254
Language:
English
Copyright:
© Morgan Kaufmann 2015
Published:
Imprint:
Morgan Kaufmann
eBook ISBN:
9780128006658
Paperback ISBN:
9780128005378

About the Author

John R. Talburt

Dr. John R. Talburt is Professor of Information Science at the University of Arkansas at Little Rock (UALR) where he is the Coordinator for the Information Quality Graduate Program and the Executive Director of the UALR Center for Advanced Research in Entity Resolution and Information Quality (ERIQ). He is also the Chief Scientist for Black Oak Partners, LLC, an information quality solutions company. Prior to his appointment at UALR he was the leader for research and development and product innovation at Acxiom Corporation, a global leader in information management and customer data integration. Professor Talburt holds several patents related to customer data integration and the author of numerous articles on information quality and entity resolution, and is the author of Entity Resolution and Information Quality (Morgan Kaufmann, 2011). He also holds the IAIDQ Information Quality Certified Professional (IQCP) credential.

Affiliations and Expertise

Professor of Information Science, University of Arkansas at Little Rock, AR, USA

Yinle Zhou

Dr. Yinle Zhou is an IBM software architect and data scientist in the InfoSphere MDM development group in Austin, Texas, and also serves as an Affiliate Member of the Graduate Faculty at University of Arkansas at Little Rock (UALR). Dr. Zhou holds a PhD in Integrated Computing with Emphasis in Information Quality (IQ) from UALR where her doctoral research focused on modeling the management of entity identity information in entity resolution systems. She also holds a Master of Science in Information Quality from UALR, a Bachelor of Business Administration in Electronic Commerce from Nanjing University in China, and the Information Quality Certified Professional (IQCP) credential issued by the International Association for Information and Data Quality (IAIDQ). Her research and publications are in areas of information quality, identity management, entity and identity resolution, and social computing

Affiliations and Expertise

Software Engineer, IBM InfoSphere Master Data Management, Austin, TX, USA

Reviews

"...good as a textbook and also as independent reading for professionals in the sector. The references where the solutions and methods treated were originally presented are included so that they can be found for further reading…a highly recommended book." --Computing Reviews, Entity Information Life Cycle for Big Data

"... covers MDM in a traditional manner, that is, maintaining current entity identity information, very well, and it can be recommended to advanced MDM practitioners (and advanced students of databases and data warehouses)." --Computing Reviews