A data vault is a database modelling methodology and architecture, created by Dan Lindstedt in the 1990s. In his own words:
‘The data vault is a detail-oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business.’
Lindstedt created the model in an attempt to ease issues surrounding scalability, flexibility, and data mining in data warehouses.
It’s considered a hybrid of the ‘third normal form’ and ‘dimensional modeling’ (or star schema) techniques, and stores all of your data. It’s often the model of choice for source systems that are subject to change over time.
But this is only skimming the surface of data vaults. Let’s explore how this technique works and how it can benefit your business.
The data vault technique separates each business function or entity into ‘business keys’. These keys form the overarching structure of the data vault, and the data warehouse takes shape around them. As these keys are largely unchanging functions, it offers some stability against organizational transformations.
In a typical data vault structure, the framework consists of three key aspects:
Every dataset within a data vault structure is essentially ‘time stamped’. This means you can determine load date and origin information.
‘The design is flexible, scalable, consistent and adaptable to the needs of the enterprise. It is a data model that is architected specifically to meet the needs of enterprise data warehouses.’ - Dan Linstedt
The data vault methodology solves complexities around data warehousing and is resistant to change.
The top four benefits of the model include:
As the volume of data increases, data warehouses can become increasingly inflexible. It then becomes difficult to add functionalities to your current datasets without breaking (and subsequently remodeling) anything. This presents a tricky blocker to businesses that anticipate change over time.
The data vault technique allows you to build upon your structure over time without putting your historical data at risk. You can continually add or delete links and satellites as you go.
As a data vault hinges on historical metadata, you can also benefit from easy auditability. You load the data into the structure without any cleansing. This means you can trace the source and lineage of each data set, including when it was loaded and from where.
This is a crucial benefit for heavily regulated businesses, such as those within the financial sector.
Because you don’t need to cleanse data before placing it in your data vault, you can load large volumes of data at speed.
What’s more, you can load this data in parallel, as there are very few dependencies between your hubs, links, and satellites during the loading process.
It’s straightforward to get a bird’s eye view of your organization’s data. Particularly as it’s organized by business functions.
This makes it simple for various departments to dip into the data vault and access any relevant information.
More often than not, you’ll want to use a data vault when you need to audit your data. As each row in every table requires load metadata, you can easily access current and historical data. This allows you to trace data origins, loading dates, and changes.
But the model’s use cases go beyond this. Data vaults can also be useful when:
On the other hand, a data vault may not be the right choice if you:
For organizations that handle large amounts of data and regularly experience change, the data vault is a compelling framework.
Its flexibility, scalability and speed of loading allow companies to build an incremental, evolving, and auditable data structure as they grow. As you organize the vault by business entities, it’s simple to dive in and use the data for analysis. That said, this methodology might not be for everyone. Particularly if your purse strings are tight.
If you’d like to learn more about your data architecture options, we recommend reading our Guide to Enterprise Data Architecture