The old adage goes, “What you put into things is what you get out of them.”. It turns out this is not just guidance for personal effort. In fact, it holds true for technology too. More data is being collected and stored than ever before so the quality of that data as the input must be the highest priority, and no business can claim functionality when data is bad.
The bad news is that bad data is a reality. A Harvard Business Review from 2017 found that “on average, 47% of newly-created data records have at least one critical (e.g., work-impacting) error.” But what is the reality of bad data?
In a best-case scenario, bad data out means a redo, which is still not necessarily a quick and easy fix. Opportunities, time, money, resources, and potentially customers are lost when the process must be repeated to correct the mistake. Clarifying bad information that was entered, troubleshooting the problems that result from delivering on the bad data, and responding to dissatisfied clients who did not get the outcome they were looking for is inefficient.
In a worst-case scenario, bad data out could mean an incorrect medical diagnosis, assets titled in the wrong name, money missing from a bank account, and the list goes on. Imagine any undesirable outcome and know that it could result from bad data. In addition to the harm done to the end user, it is hard for any company to come back from a data blunder. A company’s reputation is at risk if they become known for getting it wrong.
Do you recall NASA’s launch of the Mars Climate Orbiter in 1998? It was designed to study the Martian climate and atmosphere. Communication with the spacecraft was lost because the probe went off track and disintegrated in the atmosphere of Mars. The navigation error was due to a measurement mismatch; the software failed to convert data from English units to metric, a mistake in standardization that cost $193 million.
Another example of a bad data input was the design of the Citicorp Center in 1978. An architecture student who was writing her senior thesis on the 59-story Manhattan skyscraper, which was unique because of its raised base and diagonal bracing, uncovered a structural flaw: the potential wind loads for the building were incorrectly computed by the chief architect. Her calculation of the building’s stresses led to a welding repair process that secured the building and saved it from toppling.
We can agree that we need good data. There is no question that it has to be accurate, reliable, relevant, consistent, and complete. Even the most innovative technology will not revolutionize the world if it operates on bad data.
We can ensure the quality of data in a number of ways. It starts with improving data collection. How is it collected? Who is collecting it? What are the sources? The next step is improving data organization. Once you have it, what is the method for storing and managing it? Then, the data needs to be standardized. How can multiple sources be made consistent? What is the standard for “good”? After that, it is about data entry. If it is done by machine, are there broken paths? If it is done by humans, are there bad actors or is attention to detail lacking? Does training need to improve?
According to some projections, 74 zettabytes (that is 74 trillion gigabytes!) of data will be created in 2021! That quantity of data is huge, but the quality of data is hugely important. The good news is that those companies who are able to master good data will have a competitive edge and be poised for success in the future.