What is a data vault?

Written by CloverDX | April 19, 2022

A data vault is a database modelling methodology and architecture, created by Dan Lindstedt in the 1990s. In his own words:

‘The data vault is a detail-oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business.’

Lindstedt created the model in an attempt to ease issues surrounding scalability, flexibility, and data mining in data warehouses.

It’s considered a hybrid of the ‘third normal form’ and ‘dimensional modeling’ (or star schema) techniques, and stores all of your data. It’s often the model of choice for source systems that are subject to change over time.

But this is only skimming the surface of data vaults. Let’s explore how this technique works and how it can benefit your business.

How does it work?

The data vault technique separates each business function or entity into ‘business keys’. These keys form the overarching structure of the data vault, and the data warehouse takes shape around them. As these keys are largely unchanging functions, it offers some stability against organizational transformations.

In a typical data vault structure, the framework consists of three key aspects:

Hubs. These hubs contain a list of stable business keys. For instance, employees, or license type.
Links. These represent relationships and transactions between hubs. As an example, you might create a link between EMPLOYEE_HUB and SOFTWARE_LICENSE_HUB when an employee has a license to a certain platform.
Satellites. Finally, satellites contain temporary and descriptive metadata around hubs and links. This extra information aids businesses with historical tracking and auditing. For instance, a satellite might contain contact details for a customer.

Every dataset within a data vault structure is essentially ‘time stamped’. This means you can determine load date and origin information.

4 data vault benefits

‘The design is flexible, scalable, consistent and adaptable to the needs of the enterprise. It is a data model that is architected specifically to meet the needs of enterprise data warehouses.’ - Dan Linstedt

The data vault methodology solves complexities around data warehousing and is resistant to change.

The top four benefits of the model include:

1. Scalability

As the volume of data increases, data warehouses can become increasingly inflexible. It then becomes difficult to add functionalities to your current datasets without breaking (and subsequently remodeling) anything. This presents a tricky blocker to businesses that anticipate change over time.

The data vault technique allows you to build upon your structure over time without putting your historical data at risk. You can continually add or delete links and satellites as you go.

2. Auditability and historical tracking

As a data vault hinges on historical metadata, you can also benefit from easy auditability. You load the data into the structure without any cleansing. This means you can trace the source and lineage of each data set, including when it was loaded and from where.

This is a crucial benefit for heavily regulated businesses, such as those within the financial sector.

3. Fast loading

Because you don’t need to cleanse data before placing it in your data vault, you can load large volumes of data at speed.

What’s more, you can load this data in parallel, as there are very few dependencies between your hubs, links, and satellites during the loading process.

4. Enterprise-wide oversights

It’s straightforward to get a bird’s eye view of your organization’s data. Particularly as it’s organized by business functions.

This makes it simple for various departments to dip into the data vault and access any relevant information.

When to use the data vault model

More often than not, you’ll want to use a data vault when you need to audit your data. As each row in every table requires load metadata, you can easily access current and historical data. This allows you to trace data origins, loading dates, and changes.

But the model’s use cases go beyond this. Data vaults can also be useful when:

You need to get the hard facts. Data experts often describe vaults as ‘one single source of the facts’.
You have multiple source systems. As the data vault structure is very flexible, it works particularly well for organizations with multiple data sources or changing data sources and relationships. You can add hubs, satellites, and links as you go. Similarly, you can delete any links if relationships are no longer relevant.
You want to load data quickly. As we already stated, data vault models enable fast and simultaneous data loading at scale. This is particularly beneficial if your business handles a large amount of data, such as transactional data.

On the other hand, a data vault may not be the right choice if you:

Only have one source system or unchanging data relationships. This is because making changes should be relatively easy and wouldn’t warrant a vault structure.
Need to load data directly into a data reporting tool. This would require an added step of data manipulation in order to join up your various hub, satellite, and link tables. As well as taking time, this process is prone to errors.
Are on a tight budget. Of all the data warehousing methodologies available, building a data vault is the most expensive.

Is the data vault right for you?

For organizations that handle large amounts of data and regularly experience change, the data vault is a compelling framework.

Its flexibility, scalability and speed of loading allow companies to build an incremental, evolving, and auditable data structure as they grow. As you organize the vault by business entities, it’s simple to dive in and use the data for analysis. That said, this methodology might not be for everyone. Particularly if your purse strings are tight.

If you’d like to learn more about your data architecture options, we recommend reading our Guide to Enterprise Data Architecture

View full post