(This is part of our ongoing Series of Unfortunate Data Warehousing and Business Intelligence Events. Click for the complete series, so far.)
If data warehouses could talk, they might say “Don’t shoot me, I am just the messenger!”
People sometimes think the data warehouse (DW) has caused a problem with data quality or inconsistency, when in fact, the problem started in the enterprise’s source systems. But because the DW can be the only place where business people can view enterprise data, it becomes the messenger delivering the bad news. It’s an easy target to blame.
Although the DW program can’t be blamed for creating the data quality problems or inconsistency, you can blame it for not identifying and addressing the problems. Too many people in DW programs believe the following myths:
- The DW does not create or alter data, but passes on what is in the systems-of-record (SORs). A corollary is that any data quality or inconsistency problems need to be resolved by the SORs.
- The data is fine and does not have any significant quality issues (The SOR owners, IT or business people may state this.)
Don’t fall into these traps. Don’t assume anything about the state of the data. The areas where data quality and inconsistency problems lurk:
- Data quality within SOR applications may be “masked” by corrections made within reports or spreadsheets created from this data. The people who told you the data is fine might not even be aware of these “adjustments.”
- Data does not age well. Although data quality may be fine now, there’s always the chance that you’ll have problems or inconsistencies with the historical data. The problems can also arise when applications like predicative analytics need to use historical data.
- Data quality may be fine within each SOR application, but may be very inconsistent across applications. Many companies have master data inconsistency problems with product, customer and other dimensions that will not be apparent until the data is loaded into the enterprise DW.
The unfortunate events are that data quality and inconsistency problems will become evident in the enterprise DW and it will be blamed. Even if the DW program can prove the problems reside in the SORs it still will be blamed for being surprised and not proactively dealing with it. The worst case is that the DW program’s credibility will be dealt a blow from which it cannot recover.
What should be done?
Never assume the data quality or inconsistency problems don’t exist or that the DW program can ignore them. The steps you should undertake:
- Obtain data quality and consistency estimates and assumed metrics as part of a Service Level Agreement (SLA) when gathering the business and data requirements from the business and SOR application owners.
- Perform a data profiling and source systems analysis to determine the current state of data quality and consistency within and across SORs.
- Create a gap analysis between current state and desired state, i.e. data quality metrics in SLA.
- ropose data architecture and data integration tasks that are needed to bridge that gap. This should include timeline, tasks, resources and costs to implement and maintain an ongoing set of data quality processes.
- Negotiate with business and SOR application owners if effort or costs are too high to lower metrics within SLA.
You can’t fix a problem unless you can identify and admit that one exists. Data quality and inconsistency problems are fairly common, so don’t be surprised that they exist and don’t lose the DW credibility by being blindsided.