TriTuns Innovation
Improving effective use of systems to increase your technology ROI!

Data Cleansing Does Not Create Lasting Quality

Print the article

This entry was posted on 8 Nov 2006 and is filed under Data Quality.

I have seen it many, many times.  Organizations in the middle of implementing a new system find they are unable to load, migrate or integrate data from their existing systems.  After a little research they find the system is working as designed, the problem is that their data is “dirty” and causing the system to fail.  Suddenly the organization realizes it has no choice but to “fix” the data.  Now, in a panic to maintain the project schedule and minimize costs they scramble to find a tool or method to clean the data.  A variety of options exist here – using de-duplication tools, developing algorithms and scripts to “fix” the data,  manually cleaning the data, or changing functionality or business rules to eliminate the need for some of the problem data.  So, the organization pushes through with one or more approaches and eventually massages the data into a usable form.  But, does this really fix the data quality problem?  Or does this approach solve the immediate crisis by creating an additional time-bomb set to go off during the next system implementation? 

Sure, cleansing the data to address the immediate problem does offer some relief, but it does not actually solve the underlying problem.  Merely cleansing the data does nothing to correct the organizational and user behavior issues that caused the data problems in the first place.  What happens to our data quality after the new system is live for a few months?  Sure, we cleaned the data for the initial load, but does it mean our data stays clean?  NO WAY! 

The problem is that our data cleansing efforts merely treated the symptom, but left the underlying problem untouched.  Data cleansing did nothing to change user behavior that caused the data problems, thus ensuring users will continue to pollute the data going forward.  We need to solve the root problem.  We need to adjust the organizational forces so that users act in manner that results in high quality data.

While there is indeed a time and a place to clean dirty data to meet an immediate need, we should not fool ourselves into thinking that this is a lasting solution for our data quality problems.  Whenever we undertake a data cleansing effort we need to make sure we also adopt a data quality program that focuses on changing user behavior.  Otherwise we will find ourselves in an infinite loop of cleaning-corrupting-cleaning our data.

 

What did you think of this article?




Trackbacks
Trackback specific URL for this entry
  • No trackbacks exist for this entry.
Comments
    • No comments exist for this entry.
Leave a comment

Comments are closed.