Entity Resolution and Information Quality

By

  • John R. Talburt, Professor of Information Science, University of Arkansas at Little Rock, AR, USA

Entity Resolution and Information Quality presents topics and definitions, and clarifies confusing terminologies regarding entity resolution and information quality. It takes a very wide view of IQ, including its six-domain framework and the skills formed by the International Association for Information and Data Quality {IAIDQ). The book includes chapters that cover the principles of entity resolution and the principles of Information Quality, in addition to their concepts and terminology. It also discusses the Fellegi-Sunter theory of record linkage, the Stanford Entity Resolution Framework, and the Algebraic Model for Entity Resolution, which are the major theoretical models that support Entity Resolution. In relation to this, the book briefly discusses entity-based data integration (EBDI) and its model, which serve as an extension of the Algebraic Model for Entity Resolution. There is also an explanation of how the three commercial ER systems operate and a description of the non-commercial open-source system known as OYSTER. The book concludes by discussing trends in entity resolution research and practice. Students taking IT courses and IT professionals will find this book invaluable.
View full description

Audience

Database administrators, data/Information analysts, information and enterprise architects, data warehouse and systems engineers, and software developers working on an identity resolution engine or middleware stack.

 

Book information

  • Published: December 2010
  • Imprint: MORGAN KAUFMANN
  • ISBN: 978-0-12-381972-7

Reviews

"This book is comprehensive, timely, and on the leading edge of the topic. In addition to being comprehensive and systematic, the book has two distinct characteristics: (1) it addresses the issue of entity relationships, which go beyond entity matching. This novel approach generates much richer information about entities; (2) it discusses not only techniques, but also systems that implement the techniques. This system-oriented approach helps the reader to see how to apply the techniques for problem solving."--Dr. Hongwei (Harry) Zhu - Assistant Professor of Information Technology in the College of Business and Public Administration, Old Dominion University

"Talburt, the author of this book, is one of the organizers of the first graduate degree program in information quality, hosted by the University of Arkansas at Little Rock. The book contains seven easy-to-read chapters. A chapter on trends and research topics in entity resolution closes this short textbook. Some of the suggestions will undoubtedly encourage graduate students to pursue their research on data integration topics. The book offers interesting pointers and bibliographic references for exploring new avenues of research."--Computing Reviews

"Talburt (information science, U. of Arkansas-Little Rock) presents a textbook developed from a graduate course on the two emerging specialties within information science. Students tend to come from a number of disciplines, so no deep background in information science is assumed, and the material may even be suitable for upper-level undergraduate courses. He covers principles of entity resolution and information quality, entity resolution models and systems, entity-based data integration, the OYSTER open-source software development project, and trends in research and applications."--SciTech Book News




Table of Contents


Foreword

Preface

Acknowledgements

Chapter 1 Principles of Entity Resolution

    Entity Resolution

    Entity Resolution Activities

    Summary

    Review Questions

Chapter 2 Principles of Information Quality

    Information Quality

    IQ and the Quality of Information

    Two IP Examples

    IQ Management

    Information versus Process

    IQ and HPC

    The Evolution of Information Quality

    IQ as an Academic Discipline

    IQ and ER

    Summary

    Review Questions

Chapter 3 Entity Resolution Models

    Overview

    The Fellegi-Sunter Model

    SERF Model

    Algebraic Model

    ENRES Meta-Model

    Summary

    Review Questions

Chapter 4 Entity-Based Data Integration

    Introduction

    Formal Framework for Describing EBDI

    Optimizing Selection Operator Accuracy

    More Complex Selection Rules

    Summary

    Review Questions

Chapter 5 Entity Resolution Systems

    Introduction

    DataFlux dfPowerStudio

    Infoglide Identity Resolution Engine

    Acxiom AbiliTec

    Summary

    Review Questions

Chapter 6 The OYSTER Project

    Background

    OYSTER Logic

    Transitive Equivalence Example

    Asserted Equivalence Example

    Febrl: Open-Source Project

    Summary

    Review Questions

Chapter 7 Trends in Entity Resolution Research and Applications

    Introduction

    ER and Information Hubs

    Association Analysis and Social Networks

    HPC in ER

    Integration of ER and IQ

    Entity-Based Data Integration

    Fundamental ER Research

    Summary

    Review Questions

Bibliography

Glossary

Appendix

Index