5

Even after reading on Data Scrubbing on Wikipedia, I am still not clear on what Data Scrubbing really is when the term is used for database.

Is it a formal engineering principal that there is a pre-defined way to perform data scrubbing? If so, what is the keyword I should research for?

-- or --

Is it a general or a loose term for simply cleaning inconsistent data in database?

What IS Data Scrubbing?

masegaloeh
  • 17,978
  • 9
  • 56
  • 104
dance2die
  • 1,961
  • 7
  • 31
  • 40

2 Answers2

5

In a database context, it's correction of data which is consistent with the schema but erroneous on a higher level, e.g. invalid credit card numbers and SSNs, duplicate records, format mismatches, and so on.

It is a general, loose term that only acquires specific meaning in a particular case context.

chaos
  • 7,463
  • 4
  • 33
  • 49
3

I have created "Data Scrubbing" routines to periodically check and fix database problems that may not be practical to check in real-time (i.e. check for errors, inconsistencies, or duplicates as the data is entered). A scrubbing routine can fix specific types of errors such as checking that zip code entry matches the city/state or maybe look for variations of a customers name (duplicate customer), given the address.

Sometimes when a database is de-normalized (for performance reasons), a scrubbing routine can check the database during "off-peak" times to make sure the data remains consistent.

Robert Cartaino
  • 788
  • 1
  • 5
  • 18