The concept of data quality emerged back in 1996, when Richard Y. Wang and Diane M. Strong recognized 15 dimensions. Since then, there have been many different interpretations of what defines it, with the Data Management Association (DAMA) identifying as many as 65 dimensions in 2020.
While there is clearly no established standard for these measurements, up until recently all the dimensions typically included the same six core dimensions. But lately this too changed, so that organizations benchmarking the quality of their data have an additional two core dimensions to address.
With that in mind, let’s take a look at the 8 core dimensions of data quality.
Accurate data reflects the real world. It’s factual and up to date, and serves as a source of reliable information that you can trust. When data is inaccurate, there are real-world implications.
Accuracy in data is crucial in healthcare, for instance, where the wrong information can lead to an incorrect diagnosis and treatment plan. A misplaced zero in patient dosage, for example, could result in treatment that is either too potent or not potent enough. In finance, it could result in a violation of standards. The more accurate the data, the more capable organizations are to make decisions that will have a positive impact on their customers and business.
Consistent data emerges when all instances are the same across multiple data sets. It’s important because it improves your ability to link data from multiple sources and thus increases the usability of the data.
Inconsistent data is common when there is duplicate data and/ or a lack of standardized processes for data entry. For example, in multiple instances of a customer’s details there might an old phone number as well as their new version, resulting in inconsistent data that you can’t rely on. And if you fail to standardize the data entry of dates to mm/dd/yyyy, for instance, you can end up with conflicting data for crucial information like dates of birth or end dates of contracts.
Data relevancy means different things for different industries. The data that is relevant to a financial institute will have little or no value to a healthcare provider. Similarly, data that a retailer collects to fuel its marketing efforts will have little import for a law enforcement agency.
It’s important for organizations establish what data is relevant so no time gets wasted on processing irrelevant information. Dealing with relevant data helps businesses gain better insights into customer behavior, and enables better decision-making as a result.
Auditing your databases allows you to track how data gets used, as well as any changes made so important information doesn’t get permanently overwritten. And if data gets misused, auditing will enable you to see this too.
Having transparency across your databases will allow you to see which records get accessed, as well as by whom. This will help you identify any risks of data breaches, which will in turn help you improve data compliance within your business.
Data auditing also helps reduce the time it takes to access information. The easier your database is to navigate, the faster you can find relevant data which will improve your service.
Data is complete when all the necessary data is available. This doesn’t necessarily mean that all the data fields must get filled out—only that those critical to you do.
In healthcare, this could mean a patient listing their full range of allergies. One missing field could result in an unsuitable, potentially dangerous treatment plan.
Incomplete data is not the same as inaccurate data, as you can have a complete data set and the information still be incorrect. Data completeness should get measured across entire records, and not just at the item level. Normally, it’s assessed in percentages, with each organization needing to establish what number is an acceptable deficit.
Data timeliness is about minimizing latency, so the data is with the right people at the right time. Depending on the industry, data timeliness, and data lateness, can have different implications. Take the Air Traffic Control space as an example. The safety of the skies relies on a continual flow of real-time data. But not every field requires such regular information to effectively ensure the quality of their data.
What’s important is that each organization is using the data that is correct in that specific moment in time. Insights based on old data can result in poor decision-making. The newer the data, the more likely it is to be accurate.
This dimension measures how data conforms to business rules such as the format, type, and range. All emails must feature @ is one example of this. Another would be when an employee’s ID badge features letters that denote security clearance, with incorrect entry resulting in forbidding access to authorized personnel.
Ensuring data validity means each organization establishes the parameters the data must meet. This will mean it can get used with other data sources, and will contribute to the more efficient running of automated data processes.
Data validation with CloverDXData duplication causes a whole array of problems. When the same data gets stored in multiple locations, it results in the use of unnecessary storage space. But worse than this is the confusion it can cause.
Imagine there are two sets of records, but only one gets updated when a customer submits their new phone number. The new version can get confused with the old version, and this results in data that is unreliable.
Unique data is therefore necessary to ensure that you are viewing the latest, most relevant data set. Merging duplicate data so that you erase irrelevant versions while tracking changes within your database will enable this.
“Effective decision making requires business leaders to reframe what is essential, who or what is involved — and rethink how to leverage data and analytics to improve decision making. The result will be a new core competency, driving better business outcomes.” - Gartner
Poor quality data costs organizations an average $12.9 million each year. But while this figure might seem huge, it doesn’t begin to speak to the impact poor quality data has over time.
Indeed, poor quality data can mean an inaccurate understanding of customers. As such, it can lead to organizations making poor business decisions.
Monitoring the eight dimensions of data quality will help you better measure the quality of your data. That said, data can change with time, so it’s vital to your business that you assess your data quality regularly. The right data tool will help you ensure data quality across your entire organization. And faster, more accurate insights for better business decisions as a result.
Building data pipelines to handle bad data