Building a Scalable Data Warehouse with Data Vault 2.0

Building a Scalable Data Warehouse with Data Vault 2.0

1st Edition - September 15, 2015

Write a review

  • Authors: Dan Linstedt, Michael Olschimke
  • Paperback ISBN: 9780128025109
  • eBook ISBN: 9780128026489

Purchase options

Purchase options
DRM-free (EPub, Mobi, PDF)
Sales tax will be calculated at check-out

Institutional Subscription

Free Global Shipping
No minimum order


The Data Vault was invented by Dan Linstedt at the U.S. Department of Defense, and the standard has been successfully applied to data warehousing projects at organizations of different sizes, from small to large-size corporations. Due to its simplified design, which is adapted from nature, the Data Vault 2.0 standard helps prevent typical data warehousing failures. "Building a Scalable Data Warehouse" covers everything one needs to know to create a scalable data warehouse end to end, including a presentation of the Data Vault modeling technique, which provides the foundations to create a technical data warehouse layer. The book discusses how to build the data warehouse incrementally using the agile Data Vault 2.0 methodology. In addition, readers will learn how to create the input layer (the stage layer) and the presentation layer (data mart) of the Data Vault 2.0 architecture including implementation best practices. Drawing upon years of practical experience and using numerous examples and an easy to understand framework, Dan Linstedt and Michael Olschimke discuss: How to load each layer using SQL Server Integration Services (SSIS), including automation of the Data Vault loading processes. Important data warehouse technologies and practices. Data Quality Services (DQS) and Master Data Services (MDS) in the context of the Data Vault architecture.

Key Features

  • Provides a complete introduction to data warehousing, applications, and the business context so readers can get-up and running fast
  • Explains theoretical concepts and provides hands-on instruction on how to build and implement a data warehouse
  • Demystifies data vault modeling with beginning, intermediate, and advanced techniques
  • Discusses the advantages of the data vault approach over other techniques, also including the latest updates to Data Vault 2.0 and multiple improvements to Data Vault 1.0


Data Analysts, Business Intelligence and Data Warehousing Professionals, and Business Analysts

Table of Contents

    • Authors Biography
    • Foreword
    • Preface
    • Acknowledgments
    • Chapter 1: Introduction to Data Warehousing
      • Abstract
      • 1.1. History of Data Warehousing
      • 1.2. The Enterprise Data Warehouse Environment
      • 1.3. Introduction to Data Vault 2.0
      • 1.4. Data Warehouse Architecture
    • Chapter 2: Scalable Data Warehouse Architecture
      • Abstract
      • 2.1. Dimensions of Scalable Data Warehouse Architectures
      • 2.2. Data Vault 2.0 Architecture
    • Chapter 3: The Data Vault 2.0 Methodology
      • Abstract
      • 3.1. Project Planning
      • 3.2. Project Execution
      • 3.3. Review and Improvement
    • Chapter 4: Data Vault 2.0 Modeling
      • Abstract
      • 4.1. Introduction to Data Vault Modeling
      • 4.2. Data Vault Modeling Vocabulary
      • 4.3. Hub Definition
      • 4.4. Link Definition
      • 4.5. Satellite Definition
    • Chapter 5: Intermediate Data Vault Modeling
      • Abstract
      • 5.1. Hub Applications
      • 5.2. Link Applications
      • 5.3. Satellite Applications
    • Chapter 6: Advanced Data Vault Modeling
      • Abstract
      • 6.1. Point-in-Time Tables
      • 6.2. Bridge Tables
      • 6.3. Reference Tables
    • Chapter 7: Dimensional Modeling
      • Abstract
      • 7.1. Introduction
      • 7.2. Star Schemas
      • 7.3. Multiple Stars
      • 7.4. Dimension Design
    • Chapter 8: Physical Data Warehouse Design
      • Abstract
      • 8.1. Database Workloads
      • 8.2. Separate Environments for Development, Testing, and Production
      • 8.3. Microsoft Azure Cloud Computing Platform
      • 8.4. Physical Data Warehouse Architecture on Premise
      • 8.5. Database Options
      • 8.6. Setting up the Data Warehouse
    • Chapter 9: Master Data Management
      • Abstract
      • 9.1. Definitions
      • 9.2. Master Data Management Goals
      • 9.3. Drivers for Managing Master Data
      • 9.4. Operational vs. Analytical Master Data Management
      • 9.5. Master Data Management as an Enabler for Managed Self-Service BI
      • 9.6. Master Data Management as an Enabler for Total Quality Management
      • 9.7. Creating a Model
      • 9.8. Importing a Model
      • 9.9. Integrating MDS with the Data Vault and Operational Systems
    • Chapter 10: Metadata Management
      • Abstract
      • 10.1. What is Metadata?
      • 10.2. Implementing the Meta Mart
      • 10.3. Implementing the Metrics Vault
      • 10.4. Implementing the Metrics Mart
      • 10.5. Implementing the Error Mart
    • Chapter 11: Data Extraction
      • Abstract
      • 11.1. Purpose of Staging Area
      • 11.2. Hashing in the Data Warehouse
      • 11.3. Purpose of the Load Date
      • 11.4. Purpose of the Record Source
      • 11.5. Types of Data Sources
      • 11.6. Sourcing Flat Files
      • 11.7. Sourcing Historical Data
      • 11.8. Sourcing the Sample Airline Data
      • 11.9. Sourcing Denormalized Data Sources
      • 11.10. Sourcing Master Data from MDS
    • Chapter 12: Loading the Data Vault
      • Abstract
      • 12.1. Loading Raw Data Vault Entities
      • 12.2. Loading Reference Tables
      • 12.3. Truncating the Staging Area
    • Chapter 13: Implementing Data Quality
      • Abstract
      • 13.1. Business Expectations Regarding Data Quality
      • 13.2. The Costs of Low Data Quality
      • 13.3. The Value of Bad Data
      • 13.4. Data Quality in the Architecture
      • 13.5. Correcting Errors in the Data Warehouse
      • 13.6. Transform, Enhance and Calculate Derived Data
      • 13.7. Standardization of Data
      • 13.8. Correct and Complete Data
      • 13.9. Match and Consolidate Data
      • 13.10. Creating Dimensions from Same-As Links
    • Chapter 14: Loading the Dimensional Information Mart
      • Abstract
      • 14.1. Using the Business Vault as an Intermediate to the Information Mart
      • 14.2. Materializing the Information Mart
      • 14.3. Leveraging PIT and Bridge Tables for Virtualization
      • 14.4. Implementing Temporal Dimensions
      • 14.5. Implementing Data Quality Using PIT Tables
      • 14.6. Dealing with Reference Data
      • 14.7. About Hash Keys in the Information Mart
    • Chapter 15: Multidimensional Database
      • Abstract
      • 15.1. Accessing the Information Mart
      • 15.2. Creating Dimensions
      • 15.3. Creating Cubes
      • 15.4. Accessing the Cube
    • Subject Index

Product details

  • No. of pages: 688
  • Language: English
  • Copyright: © Morgan Kaufmann 2015
  • Published: September 15, 2015
  • Imprint: Morgan Kaufmann
  • Paperback ISBN: 9780128025109
  • eBook ISBN: 9780128026489

About the Authors

Dan Linstedt

Dan Linstedt has more than 25 years of experience in the Data Warehousing and Business Intelligence field and is internationally known for inventing the Data Vault 1.0 model and the Data Vault 2.0 System of Business Intelligence. He helps business and government organizations around the world to achieve BI excellence by applying his proven knowledge in Big Data, unstructured information management, agile methodologies and product development. He has held training classes and presented at TDWI, Teradata Partners, DAMA, Informatica, Oracle user groups and Data Modeling Zone conference. He has a background in SEI/CMMI Level 5, and has contributed architecture efforts to petabyte scale data warehouses and offers high quality on-line training and consulting services for Data Vault.

Affiliations and Expertise

Founder and Principal of Empowered Holdings, LLC, St. Albans, VT, USA

Michael Olschimke

Michael has more than 15 years of experience in IT and has been working on business intelligence topics for the past eight years. He has consulted for a number of clients in the automotive industry, insurance industry and non-profits. In addition, he has consulted for government organizations in Germany on business intelligence topics. Michael is responsible for the Data Vault training program at Dörffler + Partner GmbH, a German consulting firm specialized in data warehousing and business intelligence. He is also a lecturer at the University of Applied Sciences and Arts in Hannover, Germany. In addition, he maintains, a community site on Data Vault topics.

Affiliations and Expertise

BI Consultant, Dörffler + Partner GmbH, Hannover, Germany

Ratings and Reviews

Write a review

Latest reviews

(Total rating for all reviews)

  • JerryManni Sat Feb 29 2020

    Very interesting concepts explained in-depth.

    Very interesting concepts explained in-depth.

  • AntonTrapp Thu May 30 2019

    The good and the bad things...

    Good: About 50% of the book are really a good introduction to the DWH 2.0 concept. 100% worth reading. Bad: 50% are screenshots from M$ SqlServer server screens, pages full of comma separated attributes that are completely irrelevant. So: If you should be working with M$ SqlServer, but never seen how the UI looks like and love aircrafts and unnecessary text -> buy it. If you are a M$ SqlServer guy and want to learn something there is about 1/4 junk text (select a, b, c, ... zzz224 - but WITHOUT the ... part). Consider this when buying. If you don't care about M$ Sql Sever and junk text consider that ~ 50% of the book is just there to make the book look bigger and to slow you down during learning. Consider that when buying. However, the partly hidden information (between screenshots and pages of attributes) is good. Just check the price/value. Ideally the author would split the book into 2 parts (this book + a "apply the concept with M$ Sql server for dummies) and remove the ~ 20% text that is just a list of unneeded columns). Then I would rate the 1st part with 5 stars and won't care about the second.

  • ChristianDesilets Tue Dec 11 2018

    Building a Scalable Data Warehouse with Data Vault 2.0

    Super !