My earlier blog (What is Hadoop and Why Should a Supply Chain Professional Care?) generated a lot of attention and I was asked to write a follow up. While Hadoop has a future in supply chain management, because of the massive amounts of data generated, so do plenty of other big data technologies.
Think about big data in terms of the classic three V’s – volume, velocity and variety. Each of these facets of big data challenges conventional relational database management (RDBMS) technologies that lie at the heart of supply chain management applications:
- Volume: Relational databases were designed in the day when a server was a processor, some memory and a disk or two, all in a single box. If you needed to support more users, a bigger workload, or more data, that was easy – up to a point. Just upgrade the CPU, add memory, or disk as appropriate. But only within the box (known as vertical scaling in the trade). Scaling performance beyond the physical boundaries of an individual computer (horizontal scaling) was much harder. Relational databases just weren’t designed to work that way. Although most vendors do provide a solution, these are often complex and typically do not provide anything close to linear scalability in practice. Managing extreme volumes of data with relational technologies remains a challenge.
- Velocity: In a supply chain that is increasingly connected by the industrial Internet of things, sensors are generating updates at ever-increasing rates. To take full advantage of this, organizations need to be able to act faster on these updates when necessary. While relational databases can ingest data rapidly, traditional business intelligence solutions aren’t architected to rapidly turn that data into information and flow it through to business managers. In situations where a rapid response to emerging business events is required, a different solution is needed.
- Variety: Relational databases deal with highly structured numerical information – credits/debits to financial accounts, sales orders, inventory levels, fleet capacity – and so on. They were never intended to handle text, images, or other data types that have little or no structure to provide context. Increasingly though, unstructured or semi-structured data (such as XML) is on the rise. Although relational database systems have been extended to contain large unstructured data objects, they often don’t do it particularly well.
With the three V’s stretching conventional data management approaches to the limit, a number of alternatives have emerged – including Hadoop. Two categories that are likely to find their way into supply chain applications include:
- NoSQL (“Not-only SQL”) databases use a more flexible approach to data structures than the row-column schema used by relational databases. That in itself helps to address the problem posed by managing an increasing variety of data types. In addition, NoSQL databases are typically designed to be distributed out-of-the-box. In practice this means they are both horizontally scalable, and highly available – both very important attributes when data volumes become truly massive. This class of database has already started to appear in supply chain management. For example, SCM startup vendor Elementum has adopted MongoDB, a NoSQL database that I reviewed here. Likewise, Apex Supply Chain Technologies uses a commercial version of Apache Cassandra provided by DataStax.
- Complex event processing (CEP) technologies can help to tame data velocity. (Sal Spada and I have written about the role of this technology in manufacturing). CEP (also called stream processing) engines enable organizations to make decisions based on real-time analysis of many incoming data streams. Crucially, decisions may be made on values sampled from across streams, not just a single stream. Decisions are based on rules that are either configured or programmed, depending on the particular solution used. During operation, the defined rules are applied to the events that stream into the engine, and the resulting actions taken automatically, without human intervention. Such is the ability to react fast to complex events that the early markets for CEP have been in financial services – both facilitating high speed trading, and detecting fraudulent behavior. Now though, early applications are being seen in the supply chain. For example, Royal Dirkzwager use Software AG’s Apama to track shipping.
Many supply chain applications based on relational databases will chug along just fine for some time to come. However, as supply chains get bigger, more complex, and demand faster responses times, expect to see NoSQL databases and complex event processing gaining more than a foothold too.