Enterprise Knowledge Management

The Data Quality Approach


  • David Loshin, President, Knowledge Integrity Incorporated, Silver Spring, MD, USA

Today, companies capture and store tremendous amounts of information about every aspect of their business: their customers, partners, vendors, markets, and more. But with the rise in the quantity of information has come a corresponding decrease in its quality--a problem businesses recognize and are working feverishly to solve.Enterprise Knowledge Management: The Data Quality Approach presents an easily adaptable methodology for defining, measuring, and improving data quality. Author David Loshin begins by presenting an economic framework for understanding the value of data quality, then proceeds to outline data quality rules and domain-and mapping-based approaches to consolidating enterprise knowledge. Written for both a managerial and a technical audience, this book will be indispensable to the growing number of companies committed to wresting every possible advantage from their vast stores of business information.
View full description


IT, Database, and Business Managers


Book information

  • Published: January 2001
  • ISBN: 978-0-12-455840-3

Table of Contents

PrefaceChapter 1 - IntroductionData Quality Horror StoriesKnowledge Management and Data QualityReasons for Caring about Data QualityKnowledge Management and Business RulesStructure of this BookChapter 2 - Who Owns Information?The Information FactoryComplicating NotionsResponsibilities of OwnershipOwnership ParadigmsCentralizing, Decentralization and Data Ownership PoliciesOwnership and Data QualitySummaryChapter 3 - Data Quality in PracticeData Quality Defined: Fitness for UseThe Quality Improvement ProgramData Quality and OperationsData Quality and DatabasesData Quality and the Data WarehouseData MiningData Quality and Electronic Data InterchangeData Quality and the World Wide WebSummaryChapter 4 - Economic Framework of Data Quality and the Value PropositionEvidence of Economic ImpactData Flows and Information ChainsExamples of Information ChainsImpactsEconomic MeasuresImpact DomainsOperational ImpactsTactical and Strategic ImpactsPutting It All Together - the Data Quality ScorecardAdjusting the Model for Solution CostsExampleSummaryChapter 5 - Dimensions of Data QualitySample Data ApplicationData Quality of Data ModelsData Quality of Data ValuesData Quality of Data DomainsData Quality of Data PresentationData Quality of Information PolicySummary: Importance of the Dimensions of Data QualityChapter 6 - Statistical Process Control and the Improvement CycleVariation and ControlControl ChartThe Pareto PrincipleBuilding a Control ChartKinds of Control ChartsExample: Invalid RecordsThe Goal of Statistical Process ControlInterpreting a Control ChartFinding Special CausesMaintaining ControlSummaryChapter 7 - Domains, Mappings, and Enterprise Reference DataData TypesOperationsDomainsMappingsExample: Social Security NumbersDomains, Mappings, and MetadataThe Publish/Subscribe Model of Reference Data ProvisionSummaryChapter 8 - Data Quality Assertions and Business RulesData Quality Assertions as Business RulesThe 9 Classes of Data Quality Rules"Null Value" RulesValue Manipulation Operators and FunctionsValue RulesDomain Membership RulesDomain Mappings and Relations on Finite Defined DomainsRelation RulesTable, Cross-Table, and Cross-Message AssertionsIn-Process RulesOperational RulesOther Rules Rule Management, Compilation, and ValidationRule OrderingSummaryChapter 9 - Measurement and Current State AssessmentIdentify Each Data CustomerMapping the Information ChainChoose Locations in the Information ChainChoose a Subset of the DQ DimensionsIdentify Sentinel RulesMeasuring Data QualityMeasuring Data Quality of Data ModelsMeasuring Data Quality of Data ValuesMeasuring Data Quality of Data DomainsMeasuring Data Quality of Data PresentationMeasuring Data Quality of Information PolicyStatic vs. Dynamic MeasurementCompiling ResultsSummaryChapter 10 - Data Quality RequirementsThe Assessment Process, ReviewedReviewing the AssessmentDetermining ExpectationsUse Case AnalysisAssignments of ResponsibilityCreating RequirementsThe Data Quality RequirementsSummaryChapter 11 - Metadata, Guidelines, and PolicyGeneric ElementsData Types and DomainsSchema MetadataUse and SummarizationHistoricalManaging Data DomainsManaging Domain MappingsManaging RulesMetadata BrowsingMetadata as a Driver of PolicySummaryChapter 12 - Rule-Based Data QualityRule BasicsWhat is a Business Rule?Data Quality Rules are Business Rules (and Vice-Versa)Advantages of the Rule-Based ApproachIntegrating a Rule-Based SystemRule ExecutionDeduction vs. Goal-OrientationEvaluation of a Rules SystemLimitations of the Rule-based ApproachRule Based Data QualitySummaryChapter 13 - Metadata and Rule DiscoveryDomain DiscoveryMapping DiscoveryClustering for Rule DiscoveryKey DiscoveryDecision and Classification TreesAssociation Rules and Data Quality RulesSummaryChapter 14 - Data CleansingStandardizationCommon Error ParadigmsRecord ParsingMetadata CleansingData Correction and EnhancementApproximate Matching and SimilarityConsolidationUpdating Missing FieldsAddress StandardizationSummaryChapter 15 - Root Cause Analysis and Supplier ManagementWhat is Root Cause Analysis?Debugging the ProcessDebugging the ProblemCorrective Measures - Resolve or Not?Supplier ManagementSummaryChapter 16 - Data Enrichment/EnhancementWhat is Data Enrichment?Examples of Data EnhancementEnhancement through StandardizationEnhancement through ProvenanceEnhancement through ContextEnhancement through Data MiningData Matching, Merging, and Record LinkageLarge Scale Data Aggregation and LinkageImproving Linkage with Approximate MatchingEnhancement through InferenceData Quality Rules for EnhancementBusiness Rules for EnhancementSummaryChapter 17 - Data Quality and Business Rules in PracticeTurning Rules into ImplementationOperational DirectivesData Quality and the Transaction FactoryData Quality and the Data WarehouseRules and EDIData Quality Rules and Automated UIsSummaryChapter 18 - Building the Data Quality PracticeRecognize the ProblemManagement Support and the Data Ownership PolicySpread the WordMapping the Information ChainData Quality ScorecardCurrent State AssessmentRequirements AssessmentChoose a ProjectBuild Your TeamBuild Your ArsenalMetadata ModelDefine Data Quality RulesArchaeology/Data MiningManage Your SuppliersExecute the ImprovementMeasure ImprovementBuild on Each SuccessConclusion