Principles of Data Integration

Principles of Data Integration

1st Edition - June 25, 2012

Write a review

  • Authors: AnHai Doan, Alon Halevy, Zachary Ives
  • eBook ISBN: 9780123914798
  • Hardcover ISBN: 9780124160446

Purchase options

Purchase options
DRM-free (EPub, Mobi, PDF)
Available
Sales tax will be calculated at check-out

Institutional Subscription

Free Global Shipping
No minimum order

Description

Principles of Data Integration is the first comprehensive textbook of data integration, covering theoretical principles and implementation issues as well as current challenges raised by the semantic web and cloud computing. The book offers a range of data integration solutions enabling you to focus on what is most relevant to the problem at hand. Readers will also learn how to build their own algorithms and implement their own data integration application. Written by three of the most respected experts in the field, this book provides an extensive introduction to the theory and concepts underlying today's data integration techniques, with detailed, instruction for their application using concrete examples throughout to explain the concepts. This text is an ideal resource for database practitioners in industry, including data warehouse engineers, database system designers, data architects/enterprise architects, database researchers, statisticians, and data analysts; students in data analytics and knowledge discovery; and other data professionals working at the R&D and implementation levels.

Key Features

  • Offers a range of data integration solutions enabling you to focus on what is most relevant to the problem at hand
  • Enables you to build your own algorithms and implement your own data integration applications

Readership

Database practitioners in industry, i.e., data warehouse engineers, database system designers, data architects/enterprise architects, database researchers, statisticians, data analysts, and other data professionals working at the R&D and implementation levels. Students in data analytics and knowledge discovery

Table of Contents

  • Dedication

    Preface

    1. Introduction

    1.1 What Is Data Integration?

    1.2 Why Is It Hard?

    1.3 Data Integration Architectures

    1.4 Outline of the Book

    Bibliographic Notes

    Part I: Foundational Data Integration Techniques

    2. Manipulating Query Expressions

    2.1 Review of Database Concepts

    2.2 Query Unfolding

    2.3 Query Containment and Equivalence

    2.4 Answering Queries Using Views

    Bibliographic Notes

    3. Describing Data Sources

    3.1 Overview and Desiderata

    3.2 Schema Mapping Languages

    3.3 Access-Pattern Limitations

    3.4 Integrity Constraints on the Mediated Schema

    3.5 Answer Completeness

    3.6 Data-Level Heterogeneity

    Bibliographic Notes

    4. String Matching

    4.1 Problem Description

    4.2 Similarity Measures

    4.3 Scaling Up String Matching

    Bibliographic Notes

    5. Schema Matching and Mapping

    5.1 Problem Definition

    5.2 Challenges of Schema Matching and Mapping

    5.3 Overview of Matching and Mapping Systems

    5.4 Matchers

    5.5 Combining Match Predictions

    5.6 Enforcing Domain Integrity Constraints

    5.7 Match Selector

    5.8 Reusing Previous Matches

    5.9 Many-to-Many Matches

    5.10 From Matches to Mappings

    Bibliographic Notes

    6. General Schema Manipulation Operators

    6.1 Model Management Operators

    6.2 Merge

    6.3 ModelGen

    6.4 Invert

    6.5 Toward Model Management Systems

    6.5 Bibliographic Notes

    7. Data Matching

    7.1 Problem Definition

    7.2 Rule-Based Matching

    7.3 Learning-Based Matching

    7.4 Matching by Clustering

    7.5 Probabilistic Approaches to Data Matching

    7.6 Collective Matching

    7.7 Scaling Up Data Matching

    Bibliographic Notes

    8. Query Processing

    8.1 Background: DBMS Query Processing

    8.2 Background: Distributed Query Processing

    8.3 Query Processing for Data Integration

    8.4 Generating Initial Query Plans

    8.5 Query Execution for Internet Data

    8.6 Overview of Adaptive Query Processing

    8.7 Event-Driven Adaptivity

    8.8 Performance-Driven Adaptivity

    Bibliographic Notes

    9. Wrappers

    9.1 Introduction

    9.2 Manual Wrapper Construction

    9.3 Learning-Based Wrapper Construction

    9.4 Wrapper Learning without Schema

    9.5 Interactive Wrapper Construction

    Bibliographic Notes

    10. Data Warehousing and Caching

    10.1 Data Warehousing

    10.2 Data Exchange: Declarative Warehousing

    10.3 Caching and Partial Materialization

    10.4 Direct Analysis of Local, External Data

    Bibliographic Notes

    Part II: Integration with Extended Data Representations

    11. XML

    11.1 Data Model

    11.2 XML Structural and Schema Definitions

    11.3 Query Language

    11.4 Query Processing for XML

    11.5 Schema Mapping for XML

    Bibliographic Notes

    12. Ontologies and Knowledge Representation

    12.1 Example: Using KR in Data Integration

    12.2 Description Logics

    12.3 The Semantic Web

    Bibliographic Notes

    13. Incorporating Uncertainty into Data Integration

    13.1 Representing Uncertainty

    13.2 Modeling Uncertain Schema Mappings

    13.3 Uncertainty and Data Provenance

    Bibliographic Notes

    14. Data Provenance

    14.1 The Two Views of Provenance

    14.2 Applications of Data Provenance

    14.3 Provenance Semirings

    14.4 Storing Provenance

    Bibliographic Notes

    Part III: Novel Integration Architectures

    15. Data Integration on the Web

    15.1 What Can We Do with Web Data?

    15.2 The Deep Web

    15.3 Topical Portals

    15.4 Lightweight Combination of Web Data

    15.5 Pay-as-You-Go Data Management

    Bibliographic Notes

    16. Keyword Search

    16.1 Keyword Search over Structured Data

    16.2 Computing Ranked Results

    16.3 Keyword Search for Data Integration

    Bibliographic Notes

    17. Peer-to-Peer Integration

    17.1 Peers and Mappings

    17.2 Semantics of Mappings

    17.3 Complexity of Query Answering in PDMS

    17.4 Query Reformulation Algorithm

    17.5 Composing Mappings

    17.6 Peer Data Management with Looser Mappings

    Bibliographic Notes

    18. Integration in Support of Collaboration

    18.1 What Makes Collaboration Different

    18.2 Processing Corrections and Feedback

    18.3 Collaborative Annotation and Presentation

    18.4 Dynamic Data: Collaborative Data Sharing

    Bibliographic Notes

    19. The Future of Data Integration

    19.1 Uncertainty, Provenance, and Cleaning

    19.2 Crowdsourcing and “Human Computing”

    19.3 Building Large-Scale Structured Web Databases

    19.4 Lightweight Integration

    19.5 Visualizing Integrated Data

    19.6 Integrating Social Media

    19.7 Cluster- and Cloud-Based Parallel Processing and Caching

    Bibliography

    Index

Product details

  • No. of pages: 520
  • Language: English
  • Copyright: © Morgan Kaufmann 2012
  • Published: June 25, 2012
  • Imprint: Morgan Kaufmann
  • eBook ISBN: 9780123914798
  • Hardcover ISBN: 9780124160446

About the Authors

AnHai Doan

AnHai Doan, Associate Professor in Computer Science at the University of Wisconsin-Madison. Consulting work with Microsoft AdCenter Lab and Yahoo Research Lab.

Affiliations and Expertise

Associate Professor in Computer Science at the University of Wisconsin-Madison. Consulting work with Microsoft AdCenter Lab and Yahoo Research Lab.

Alon Halevy

Head of the Structured Data Group, Google Research, Mountain View, California. He joined Google in 2005 with the acquisition of his company, Transformic.

Affiliations and Expertise

Head of the Structured Data Group, Google Research, Mountain View, California.

Zachary Ives

Associate Professor at the University of Pennsylvania and a Faculty Member of the Penn Center for Bioinformatics. He received his PhD from the University of Washington. His research interests include data integration, data sharing among autonomous and heterogeneous systems, heterogeneous sensor networks, and information provenance and authoritativeness.

Affiliations and Expertise

Associate Professor at the University of Pennsylvania, and a Faculty Member of the Penn Center for Bioinformatics.

Ratings and Reviews

Write a review

There are currently no reviews for "Principles of Data Integration"