The Problem with Big Data

…is the same problem that has always existed with data: it is often wrong, outdated, or incomplete.

Whenever I ask supply chain professionals, “What was the most difficult and time-consuming part of your software implementation project?” the answer I get is almost always the same: collecting and cleaning the data. This was true fourteen years ago when I first started asking the question, and it’s still true today.

Several years ago, I interviewed ten CIOs from leading third-party logistics companies (3PLs), and they told me that their IT teams spend, on average, half of their time — half of their time! — fixing and scrubbing data. They spend the other half of their time mostly on maintaining current systems, leaving little or no time for innovation.

I’ve long said that poor data quality is the Achilles’ heel of supply chain management, and yet supply chains continue to function — but not without a steep cost in wasted time and resources.

Back in 2004, the National Institute of Standards and Technology (NIST) published a report — Economic Impact of Inadequate Infrastructure for Supply Chain Integration — that looked at the state of supply chain integration at the time and estimated the economic impact of inadequate integration (a leading cause of data quality problems). The researchers found (among other issues) that “manual data entry is widespread, even when machine sources are available; critical information is often manually reentered at many points in the chain” and that “interventions from purchasing clerks, order processors, and expediters are required to maintain supply-chain information flows.”

The researchers developed a model to quantify the cost of these integration issues, particularly for the automotive and electronics industries because they have very global and fragmented supply chains, and they concluded the following:

“We estimate the total annual costs of inadequacies in supply chain infrastructures to be in excess of $5 billion in the automotive industry and almost $3.9 billion in the electronics industry.”

That was more than eight years ago. Do you believe the situation has improved since 2004 or gotten worse? At a minimum, with the explosion of new data sources and data volumes — what we generally call Big Data today — managing data quality has become extremely more difficult.

So, what’s the solution?

There are many root causes to the data quality problem, including the fact that most data “standards” like ANSI X12 or EDIFACT are not standard in practice; most companies modify them, making each customer-supplier link a custom integration using non-standard syntax, unique variables, and reordered transmission. The net result is a “Tower of Babel” situation where every party has to translate each other’s messages. If companies all along the supply chain actually adhered to common standards, many data quality issues would go away.

But at a higher level, solving the data quality problem requires answering these two basic questions:

  • Who owns data quality management?
  • Do we really need all of this data and complexity?

Many operations people believe that IT is responsible for data quality, while IT points the finger back to operations and the countless external entities (suppliers, customers, logistics service providers, and so on) that send them data. Simply put, the responsibility for data quality management is not clearly defined at most companies, or it’s assumed that data quality is everybody’s responsibility, but the required governance and accountability structures don’t exist.

Companies need to clearly defines roles and responsibilities in this area. But before that, they need to view data as a corporate asset and assign value to it, just like they do to other assets like buildings, equipment, and intellectual property. Some frameworks already exist, such as the emerging field of Infonomics, which Wikipedia defines as “the emergent discipline of quantifying, managing and leveraging information as a formal business asset. Infonomics endeavors to apply both economic and asset management principles and practices to the valuation and handling of information assets.”

At the same time, companies need to take a step back and question why they are collecting certain data, and why they continue to add complexity to their supply chains. When I started my career as an engineer, for example, I remember asking my boss why we collected certain types of manufacturing data. In many cases, she didn’t know; it seemed like we just always collected that data. I later learned that in many cases, the data collection started to better understand and fix a problem, but once the problem was solved, nobody hit the stop button on the data collection, and years of unused and unnecessary data continued to accumulate.

The question of supply chain complexity was addressed in the 2008 book The Complexity Crisis: Why Too Many Products, Markets, and Customers Are Crippling Your Company–And What To Do About It by John L. Mariotti, who used to be the president of Rubbermaid’s Office Products Group. The book description sums up the problem nicely:

In companies’ quest to attain double-digit growth in single digit markets, companies have frantically created more products, customers, markets, suppliers, services and locations. Yet all this complexity adds cost causing topline revenue to go up, and bottomline profits to go down. This title shows readers how to track complexity and reduce or eliminate it. Filled with examples of companies who have successfully conquered complexity, this guide demonstrates how to institute new metrics and modifications to existing cost and management control systems so they can choose when to cut complexity out and when to capitalise on it.

The bottom line: Big Data is creating new opportunities for companies to innovate their supply chain processes and achieve higher levels of financial success. But the benefits of Big Data come with a price: the ever-growing cost of dealing with poor data quality. Companies that take action to minimize or eliminate this cost — by viewing data as a corporate asset, assigning value to it, and clearly defining roles, responsibilities, and a governance structure, as well as removing complexity from their supply chains — will gain the most from Big Data.


  1. Big data definitely has pros and cons. I was interested to hear that IBM is helping Northwestern University put together a Master’s Degree for data analytics ( Big data certainly requires a new set of skills to poor through the rows and columns and bring meaning to all that collecting.

    I think some of the blame for bad data is how hard it is to change settings once data has entered. Some suppliers must submit exact product dimensions long before they’re completely developed. Then when the actual product arrives, its too difficult to communicate the actual dimensions. What results in inaccuracies because its too hard to change, or the supplier fear penalization for not being accurate up front.

    Kevin Boudreau in an HBR Ideacast about big data said it best, that we are in the Wild West of Big Data.

Leave a Reply

Your email address will not be published. Required fields are marked *