Overview

We’ve all heard the phrase “garbage in, garbage out” when it comes to data quality. But it’s amazing how many companies still place such important decisions on bad data. It’s a problem across industries, functional groups, and business sizes. With the growing reliance on artificial intelligence (AI) and machine learning (ML), data quality is only becoming more important. After all, AI and ML depend on reliable data to work.

Everyone plays a part in data quality, including those that develop data-driven applications and interfaces, those that enter data, and those that report on and analyze data. In this article, we’ll discuss what makes up bad data, why it’s bad for your business, how you can minimize it, and how you can apply data validation best practices, using Excel as an example.

What is bad data?

Bad data includes these types:

  • Inaccurate data – data points that are slightly or blatantly wrong, or outdated.
  • Incomplete data – data points that are missing completely or partially, possibly due to being truncated during transmission
  • Duplicate data – data points that are incorrectly entered multiple times
  • Inconsistent data – data that does not match across systems or has different formats across systems, records, or fields
  • Other bad data – includes irrelevant data, altered data, or unvalidated data

Bad data is caused by three common points of failure:

  • Human error – humans are prone to distraction, biases, fatigue, and apathy, all contributing to bad data
  • Insufficient data validation – despite abundant tools, applying data validation still takes effort that is often pushed off or poorly implemented
  • Weak data security – without adequate controls, data can be added, deleted, or altered without authorization

Why is bad data so bad?

Bad data often flows through to downstream processes, often multiplying the negative effects. Throughout the data process, there are several reasons why bad data is concerning:

  • Bad decisions – decisions made using bad data vary in magnitude, but they can make or break a business.  Unwarranted mergers and acquisitions, product launches, and marketing campaigns are all examples of decisions that have often been made using bad data.
  • Lost productivity – the time spent to identify, reconcile, and correct bad data can be significant, and and at a minimum, it’s distracting and frustrating.
  • Negative reputation – a reputation for bad data can tarnish a company’s reputation, resulting in lost customers, vendors, and employees (present and future).
  • Financial impact – bad financial data can cause a variety of financial losses, write-offs, and even fines.  At a minimum, the time spent identifying, reconciling, and correcting bad data translates to expensive labor.

How you can minimize bad data

There are several best practices for minimizing bad data. The steps to employ depend on the data type and importance to your business. As with most processes, cost/benefit should be considered. Here are some common examples of how to ensure data quality and minimize bad data:

  1. Minimize user input – if data is already available digitally, users should not have to enter it manually. This eliminates a major source of bad data and inefficiency, too!
  2. Define data quality – specify what good quality looks like and why it’s important throughout your different processes.  This should be a continuous message that starts with onboarding and continues throughout the team member journey.
  3. Prioritize data quality – it’s a lot more efficient starting with quality data than fixing it later.  If you identify data quality issues, address them before they multiple and send a message of unimportance.
  4. Use data validation – put sufficient checks in every process possible (even in spreadsheets!).  Put validation on the front end (user interface) and the back end (where data is processed).  Often the front end is ignored, but if you can guide users on quality data as they go, they are less likely to replicate mistakes and have a better overall experience.
  5. Perform data audits – for mission-critical data, perform routine data quality audits.  Leverage AI and other automation tools to perform this as efficiently as possible.
  6. Promote accountability – communicate data quality expectations and train team members on it.  Setting expectations with examples of good and bad is the clearest way to communicate the importance.
  7. Maintain backups – run routine backups of mission-critical data, and store them off-site.  Even better – test recovery steps regularly.  The last thing you want to be doing if there is an issue is finding out that the recovery steps are out of date!
  8. Reconcile data – data transmissions and integrations should be sufficiently reconciled and tested for completeness, accuracy, and timeliness (cut-off). Minimizing the number of different systems and integrations limits the need to reconcile data and have one source of the truth.

Excel data validation best practices

Because of its flexibility, Excel is the source of a lot of bad data.  Data validation is a great Excel feature that isn’t used near enough. While it’s not bullet-proof, it’s a solid first line of defense.  Just realize that without sheet/cell protection, users can easily copy and paste over data validation rules or remove/change them.

  1. Minimize inputs – most data in Excel should be formula-driven and linked to source data.  Clearly define any steps for updating inputs or source data.
  2. Clarify inputs – when users must enter data (like dates, locations, names, etc.), make sure the inputs stand out with separate formatting (I like highlighting them) and they are separated from the rest of the data in a separate sheet or the top of a sheet.
  3. Clarify valid data – use instructions, cell comments, and input messages to specify what valid data is acceptable.  Using examples can help with complex requirements.
  4. Use data validation – leverage the different data validation options discussed below to guide users on the front end.  Just realize that they aren’t bullet-proof, so separately verify inputs on the back-end if relevant.
  5. Use check figures – use check figures to compare similar metrics and ensure completeness (sum of the parts equals the whole).  I like to hide these within each sheet and reference them all in one central check figures sheet for efficient reviews.
  6. Include instructions – include an instructions sheet with the purpose of the file, how it works, the source of information, and how to maintain it.  This is a gift for your future self as much as it is for others!

Excel data validation types include:

  • Number, date, time, and text length ranges, minimums, and maximums
  • Pre-defined lists
  • Custom formulas

Excel allows you to specify input messages and error messages to guide users. Input messages show up when cells are selected, and error messages show up when users enter invalid dat

While Excel’s data validation is helpful, it does not address one of the biggest needs of quality data – required fields. Blank fields do not trigger native data validation errors.

Excel data is often used to load into other systems.  Separately validating data that will be used in other systems is highly recommended. Take a look at the video walkthrough below where we walk through the different native data validation options, as well as some additional stronger options that leverage macros.  Look for a user-defined function called “Validate_Fields” that allows you to define data validation – including required fields – across multiple columns of data.

To download the workbook containing the various examples covered in the video below, click here.

Summary

Data validation is the key to moving from garbage to gold.  The quality of data coming into the system dictates the quality of the insights coming out of it.  We’re living in a world where quality, relevant data is a powerful and crucial part of running a successful business.  Making decisions based on bad data is often worse than having no data at all.  Make sure not to ignore spreadsheets – so much data is housed in them and reporting processes rely on them.

What are your favorite approaches to data validation?  Share your thoughts in the comments below!

Don't miss great tips, tricks, news, and events!

  • Get our 53 Time Hacks e-book free!
  • Get weekly insights and news
  • Valuable time-saving best practices
  • Unlock exclusive resources

Almost there! We just need to confirm the email address is yours. Please check your email for a confirmation message.