COVID-19 Update: We are currently shipping orders daily. However, due to transit disruptions in some geographies, deliveries may be delayed. To provide all customers with timely access to content, we are offering 50% off Science and Technology Print & eBook bundle options. Terms & conditions.
Measuring Data Quality for Ongoing Improvement - 1st Edition - ISBN: 9780123970336, 9780123977540

Measuring Data Quality for Ongoing Improvement

1st Edition

A Data Quality Assessment Framework

Author: Laura Sebastian-Coleman
eBook ISBN: 9780123977540
Paperback ISBN: 9780123970336
Imprint: Morgan Kaufmann
Published Date: 31st December 2012
Page Count: 376
Sales tax will be calculated at check-out Price includes VAT/GST
Price includes VAT/GST

Institutional Subscription

Secure Checkout

Personal information is secured with SSL technology.

Free Shipping

Free global shipping
No minimum order.

Table of Contents




Author Biography

Introduction: Measuring Data Quality for Ongoing Improvement

Data Quality Measurement: the Problem we are Trying to Solve

Recurring Challenges in the Context of Data Quality

DQAF: the Data Quality Assessment Framework

Overview of Measuring Data Quality for Ongoing Improvement

Intended Audience

What Measuring Data Quality for Ongoing Improvement Does Not Do

Why I Wrote Measuring Data Quality for Ongoing Improvement

Section 1. Concepts and Definitions

Chapter 1. Data



Data as Representation

Data as Facts

Data as a Product

Data as Input to Analyses

Data and Expectations


Concluding Thoughts

Chapter 2. Data, People, and Systems


Enterprise or Organization

IT and the Business

Data Producers

Data Consumers

Data Brokers

Data Stewards and Data Stewardship

Data Owners

Data Ownership and Data Governance

IT, the Business, and Data Owners, Redux

Data Quality Program Team


Systems and System Design

Concluding Thoughts

Chapter 3. Data Management, Models, and Metadata


Data Management

Database, Data Warehouse, Data Asset, Dataset

Source System, Target System, System of Record

Data Models

Types of Data Models

Physical Characteristics of Data


Metadata as Explicit Knowledge

Data Chain and Information Life Cycle

Data Lineage and Data Provenance

Concluding Thoughts

Chapter 4. Data Quality and Measurement


Data Quality

Data Quality Dimensions


Measurement as Data

Data Quality Measurement and the Business/IT Divide

Characteristics of Effective Measurements

Data Quality Assessment

Data Quality Dimensions, DQAF Measurement Types, Specific Data Quality Metrics

Data Profiling

Data Quality Issues and Data Issue Management

Reasonability Checks

Data Quality Thresholds

Process Controls

In-line Data Quality Measurement and Monitoring

Concluding Thoughts

Section 2. DQAF Concepts and Measurement Types

Chapter 5. DQAF Concepts


The Problem the DQAF Addresses

Data Quality Expectations and Data Management

The Scope of the DQAF

DQAF Quality Dimensions

Defining DQAF Measurement Types

Metadata Requirements

Objects of Measurement and Assessment Categories

Functions in Measurement: Collect, Calculate, Compare

Concluding Thoughts

Chapter 6. DQAF Measurement Types


Consistency of the Data Model

Ensuring the Correct Receipt of Data for Processing

Inspecting the Condition of Data upon Receipt

Assessing the Results of Data Processing

Assessing the Validity of Data Content

Assessing the Consistency of Data Content

Comments on the Placement of In-line Measurements

Periodic Measurement of Cross-table Content Integrity

Assessing Overall Database Content

Assessing Controls and Measurements

The Measurement Types: Consolidated Listing

Concluding Thoughts

Section 3. Data Assessment Scenarios


Assessment Scenarios

Metadata: Knowledge before Assessment

Chapter 7. Initial Data Assessment


Initial Assessment

Input to Initial Assessments

Data Expectations

Data Profiling

Column Property Profiling

Structure Profiling

Profiling an Existing Data Asset

From Profiling to Assessment

Deliverables from Initial Assessment

Concluding Thoughts

Chapter 8. Assessment in Data Quality Improvement Projects


Data Quality Improvement Efforts

Measurement in Improvement Projects

Chapter 9. Ongoing Measurement


The Case for Ongoing Measurement

Example: Health Care Data

Inputs for Ongoing Measurement

Criticality and Risk



Periodic Measurement

Deliverables from Ongoing Measurement

In-Line versus Periodic Measurement

Concluding Thoughts

Section 4. Applying the DQAF to Data Requirements


Chapter 10. Requirements, Risk, Criticality


Business Requirements

Data Quality Requirements and Expected Data Characteristics

Data Quality Requirements and Risks to Data

Factors Influencing Data Criticality

Specifying Data Quality Metrics

Concluding Thoughts

Chapter 11. Asking Questions


Asking Questions

Understanding the Project

Learning about Source Systems

Your Data Consumers’ Requirements

The Condition of the Data

The Data Model, Transformation Rules, and System Design

Measurement Specification Process

Concluding Thoughts

Section 5. A Strategic Approach to Data Quality

Chapter 12. Data Quality Strategy


The Concept of Strategy

Systems Strategy, Data Strategy, and Data Quality Strategy

Data Quality Strategy and Data Governance

Decision Points in the Information Life Cycle

General Considerations for Data Quality Strategy

Concluding Thoughts

Chapter 13. Directives for Data Quality Strategy


Directive 1: Obtain Management Commitment to Data Quality

Directive 2: Treat Data as an Asset

Directive 3: Apply Resources to Focus on Quality

Directive 4: Build Explicit Knowledge of Data

Directive 5: Treat Data as a Product of Processes that can be Measured and Improved

Directive 6: Recognize Quality is Defined by Data Consumers

Directive 7: Address the Root Causes of Data Problems

Directive 8: Measure Data Quality, Monitor Critical Data

Directive 9: Hold Data Producers Accountable for the Quality of their Data (and Knowledge about that Data)

Directive 10: Provide Data Consumers with the Knowledge they Require for Data Use

Directive 11: Data Needs and Uses will Evolve—Plan for Evolution

Directive 12: Data Quality Goes beyond the Data—Build a Culture Focused on Quality

Concluding Thoughts: Using the Current State Assessment

Section 6. The DQAF in Depth

Functions for Measurement: Collect, Calculate, Compare

Features of the DQAF Measurement Logical Data Model

Facets of the DQAF Measurement Types

Chapter 14. Functions of Measurement: Collection, Calculation, Comparison


Functions in Measurement: Collect, Calculate, Compare

Collecting Raw Measurement Data

Calculating Measurement Data

Comparing Measurements to Past History


The Control Chart: A Primary Tool for Statistical Process Control

The DQAF and Statistical Process Control

Concluding Thoughts

Chapter 15. Features of the DQAF Measurement Logical Model


Metric Definition and Measurement Result Tables

Optional Fields

Denominator Fields

Automated Thresholds

Manual Thresholds

Emergency Thresholds

Manual or Emergency Thresholds and Results Tables

Additional System Requirements

Support Requirements

Concluding Thoughts

Chapter 16. Facets of the DQAF Measurement Types


Facets of the DQAF

Organization of the Chapter

Measurement Type #1: Dataset Completeness—Sufficiency of Metadata and Reference Data

Measurement Type #2: Consistent Formatting in One Field

Measurement Type #3: Consistent Formatting, Cross-table

Measurement Type #4: Consistent Use of Default Value in One Field

Measurement Type #5: Consistent Use of Default Values, Cross-table

Measurement Type #6: Timely Delivery of Data for Processing

Measurement Type #7: Dataset Completeness—Availability for Processing

Measurement Type #8: Dataset Completeness—Record Counts to Control Records

Measurement Type #9: Dataset Completeness—Summarized Amount Field Data

Measurement Type #10: Dataset Completeness—Size Compared to Past Sizes

Measurement Type #11: Record Completeness—Length

Measurement Type #12: Field Completeness—Non-Nullable Fields

Measurement Type #13: Dataset Integrity—De-Duplication

Measurement Type #14: Dataset Integrity—Duplicate Record Reasonability Check

Measurement Type #15: Field Content Completeness—Defaults from Source

Measurement Type #16: Dataset Completeness Based on Date Criteria

Measurement Type #17: Dataset Reasonability Based on Date Criteria

Measurement Type #18: Field Content Completeness—Received Data is Missing Fields Critical to Processing

Measurement Type #19: Dataset Completeness—Balance Record Counts Through a Process

Measurement Type #20: Dataset Completeness—Reasons for Rejecting Records

Measurement Type #21: Dataset Completeness Through a Process—Ratio of Input to Output

Measurement Type #22: Dataset Completeness Through a Process—Balance Amount Fields

Measurement Type #23: Field Content Completeness—Ratio of Summed Amount Fields

Measurement Type #24: Field Content Completeness—Defaults from Derivation

Measurement Type #25: Data Processing Duration

Measurement Type #26: Timely Availability of Data for Access

Measurement Type #27: Validity Check, Single Field, Detailed Results

Measurement Type #28: Validity Check, Roll-up

Measurement Logical Data Model

Measurement Type #29: Validity Check, Multiple Columns within a Table, Detailed Results

Measurement Type #30: Consistent Column Profile

Measurement Type #31: Consistent Dataset Content, Distinct Count of Represented Entity, with Ratios to Record Counts

Measurement Type #32 Consistent Dataset Content, Ratio of Distinct Counts of Two Represented Entities

Measurement Type #33: Consistent Multicolumn Profile

Measurement Type #34: Chronology Consistent with Business Rules within a Table

Measurement Type #35: Consistent Time Elapsed (hours, days, months, etc.)

Measurement Type #36: Consistent Amount Field Calculations Across Secondary Fields

Measurement Type #37: Consistent Record Counts by Aggregated Date

Measurement Type #38: Consistent Amount Field Data by Aggregated Date

Measurement Type #39: Parent/Child Referential Integrity

Measurement Type #40: Child/Parent Referential Integrity

Measurement Type #41: Validity Check, Cross Table, Detailed Results

Measurement Type #42: Consistent Cross-table Multicolumn Profile

Measurement Type #43: Chronology Consistent with Business Rules Across-tables

Measurement Type #44: Consistent Cross-table Amount Column Calculations

Measurement Type #45: Consistent Cross-Table Amount Columns by Aggregated Dates

Measurement Type #46: Consistency Compared to External Benchmarks

Measurement Type #47: Dataset Completeness—Overall Sufficiency for Defined Purposes

Measurement Type #48: Dataset Completeness—Overall Sufficiency of Measures and Controls

Concluding Thoughts: Know Your Data




Online Materials

Appendix A. Measuring the Value of Data

Appendix B. Data Quality Dimensions


Richard Wang’s and Diane Strong’s Data Quality Framework, 1996

Thomas Redman’s Dimensions of Data Quality, 1996

Larry English’s Information Quality Characteristics and Measures, 1999

Appendix C. Completeness, Consistency, and Integrity of the Data Model


Process Input and Output

High-Level Assessment

Detailed Assessment

Quality of Definitions


Appendix D. Prediction, Error, and Shewhart’s Lost Disciple, Kristo Ivanov


Limitations of the Communications Model of Information Quality

Error, Prediction, and Scientific Measurement

What Do We Learn from Ivanov?

Ivanov’s Concept of the System as Model

Appendix E. Quality Improvement and Data Quality


A Brief History of Quality Improvement

Process Improvement Tools

Implications for Data Quality

Limitations of the Data as Product Metaphor

Concluding Thoughts: Building Quality in Means Building Knowledge in


The Data Quality Assessment Framework shows you how to measure and monitor data quality, ensuring quality over time. You’ll start with general concepts of measurement and work your way through a detailed framework of more than three dozen measurement types related to five objective dimensions of quality: completeness, timeliness, consistency, validity, and integrity. Ongoing measurement, rather than one time activities will help your organization reach a new level of data quality. This plain-language approach to measuring data can be understood by both business and IT and provides practical guidance on how to apply the DQAF within any organization enabling you to prioritize measurements and effectively report on results. Strategies for using data measurement to govern and improve the quality of data and guidelines for applying the framework within a data asset are included. You’ll come away able to prioritize which measurement types to implement, knowing where to place them in a data flow and how frequently to measure. Common conceptual models for defining and storing of data quality results for purposes of trend analysis are also included as well as generic business requirements for ongoing measuring and monitoring including calculations and comparisons that make the measurements meaningful and help understand trends and detect anomalies.

Key Features

  • Demonstrates how to leverage a technology independent data quality measurement framework for your specific business priorities and data quality challenges
  • Enables discussions between business and IT with a non-technical vocabulary for data quality measurement
  • Describes how to measure data quality on an ongoing basis with generic measurement types that can be applied to any situation


Data quality engineers, managers and analysts, application program managers and developers, data stewards, data managers and analysts, compliance analysts, Business intelligence professionals, Database designers and administrators, Business and IT managers


No. of pages:
© Morgan Kaufmann 2013
31st December 2012
Morgan Kaufmann
eBook ISBN:
Paperback ISBN:


"This book provides a very well-structured introduction to the fundamental issue of data quality, making it a very useful tool for managers, practitioners, analysts, software developers, and systems engineers. It also helps explain what data quality management entails and provides practical approaches aimed at actual implementation. I positively recommend reading it…", January 2014

"The framework she describes is a set of 48 generic measurement types based on five dimensions of data quality: completeness, timeliness, validity, consistency, and integrity. The material is for people who are charged with improving, monitoring, or ensuring data quality." --Reference and Research Book News, August 2013

"If you are intent on improving the quality of the data at your organization you would do well to read Measuring Data Quality for Ongoing Improvement and adopt the DQAF offered up in this fine book." --Data and Technology Today blog, July 2013

Ratings and Reviews

About the Author

Laura Sebastian-Coleman

Laura Sebastian-Coleman

Laura Sebastian-Coleman, Data Quality Lead at Aetna/CVS Heath, has worked on data quality in large health care data warehouses since 2003. Laura has implemented data quality metrics and reporting, launched and facilitated working stewardship groups, contributed to data consumer training programs, and led efforts to establish data standards and manage metadata. In 2009, she led a group of analysts in developing the Data Quality Assessment Framework (DQAF) which is the basis for her 2013 book, Measuring Data Quality for Ongoing Improvement. An active professional, Laura has delivered papers, tutorials, and keynotes at data-focused conferences (MIT’s Information Quality Program, DGIQ-Data Governance and Information Quality, EDW-Enterprise Data World, Data Modeling Zone, and DAMA-Data Management Association sponsored events). From 2009-2010, she served as IAIDQ’s (now Information Quality International) Director of Member Services. In 2015, she received the IAIDQ Distinguished Member Award. DAMA Publications Officer (2015-18) and production editor for the DAMA-DMBOK2 (2017), she is also author of Navigating the Labyrinth: An Executive Guide to Data Management (2018). In 2018, she received the DAMA award for excellence in the data management profession. She holds the CDMP (Certified Data Management Professional) from DAMA, the IQCP (Information Quality Certified Professional) from IAIDQ, a Certificate in Information Quality from MIT, a B.A. in English and History from Franklin & Marshall College, and Ph.D. in English Literature from the University of Rochester (NY).

Affiliations and Expertise

Data Quality Lead at Aetna/CVS Heath