Principles of Data Integration
By- AnHai Doan, Associate Professor in Computer Science at the University of Wisconsin-Madison. Consulting work with Microsoft AdCenter Lab and Yahoo Research Lab.
- Alon Halevy, Head of the Structured Data Group, Google Research, Mountain View, California.
- Zachary Ives, Associate Professor at the University of Pennsylvania, and a Faculty Member of the Penn Center for Bioinformatics.
How do you approach answering queries when your data is stored in multiple databases that were designed independently by different people? This is first comprehensive book on data integration and is written by three of the most respected experts in the field.
This book provides an extensive introduction to the theory and concepts underlying today's data integration techniques, with detailed, instruction for their application using concrete examples throughout to explain the concepts. Data integration is the problem of answering queries that span multiple data sources (e.g., databases, web pages). Data integration problems surface in multiple contexts, including enterprise information integration, query processing on the Web, coordination between government agencies and collaboration between scientists. In some cases, data integration is the key bottleneck to making progress in a field.
The authors provide a working knowledge of data integration concepts and techniques, giving you the tools you need to develop a complete and concise package of algorithms and applications.
Audience
Database practitioners in industry, i.e., data warehouse engineers, database system designers, data architects/enterprise architects, database researchers, statisticians, data analysts, and other data professionals working at the R&D and implementation levels. Students in data analytics and knowledge discovery.
Hardbound, 520 Pages
Published: June 2012
Imprint: Morgan Kaufmann
ISBN: 978-0-12-416044-6
Reviews
-
"This is the definitive book on data integration technology, written by experts who invented much of the technology they write about. Its comprehensive, with lots of technical detail very clearly explained. Its a must-read for anyone involved in the development of data integration solutions."--Philip A. Bernstein, Distinguished Scientist, Microsoft Corporation"Despite having been with us for decades, data integration remains a challenging, multi-faceted problem. This book does an excellent job of bringing together and explaining its many facets along with the technical solutions that have been developed to date. The authors are three of the field's leading contributors, with a mix of both academic and industrial experience, and their presentation includes examples and manages to make even the more theoretical material accessible to readers. All aspects of modern data integration are covered, including different styles of integration, data and schema matching, query processing and wrappers, as well as challenges posed by the Web and the wide variety of data types and formats that must be integrated today. This book should be a great resource for graduate courses on data integration."--Michael Carey, Bren Professor of Information and Computer Sciences, UC Irvine"The days of enterprises/organizations depending on a single, closed database have given way to a Web-dominated world in which multiple databases must interoperate and integrate. Doan (computer science, U. of Wisconsin, Madison) and colleagues at Google and the University of Pennsylvania address how database ideas have broadened to accommodate external sources of structured information, distributed aspects of the Web, and issues of data-sharing. Part I treats topics and techniques for data queries, integration, and warehousing covered in a database course. Part II discusses extended data representations that capture properties not present in the standard relational data model. Then they present novel architectures for, and trends in, addressing specific integration problems, e.g., of Web sources. Includes an extensive bibliography."--Reference and Research Book News, October 2012
Contents
CH 1: Introduction
Part I: Foundational Data Integration Techniques
CH 2: Manipulating Query Expressions
CH 3: Describing Data SourcesCH 4: String Matching
CH 6: General Schema Manipulation Operators
CH 5: Schema Matching and MappingCH 7: Data Matching
CH 8: Query ProcessingCH 9: Wrappers
CH 10: Data Warehousing and CachingPart II: Integration with Extended Data Representations
CH 11: XMLCH 12: Ontologies and Knowledge Representation
CH 13: Incorporating Uncertainty into Data IntegrationCH 14: Data Provenance
Part III: Novel Integration ArchitecturesCH 15: Data Integration on the Web
CH 16: Keyword Search: Integration on DemandCH 17: Peer-to-Peer Integration
CH 18: Integration in Support of CollaborationCH 19: The Future of Data Integration
