Big data projects are, well, big in size and scope, often very ambitious, and all too often, complete failures. In 2016, Gartner estimated that 60 percent of big data projects failed. A year later, Gartner analyst Nick Heudecker said his company was “too conservative” with its 60 percent estimate and put the failure rate at closer to 85 percent. Today, he says nothing has changed.
Gartner isn’t alone in that assessment. Long-time Microsoft executive and (until recently) Snowflake Computing CEO Bob Muglia told the analytics site Datanami, “I can’t find a happy Hadoop customer. It’s sort of as simple as that. … The number of customers who have actually successfully tamed Hadoop is probably fewer than 20 and it might be fewer than ten. That’s just nuts given how long that product, that technology has been in the market, and how much general industry energy has gone into it.” Hadoop, of course, is the engine that launched the big data mania.
Other people familiar with big data also say the problem remains real, severe, and not entirely one of technology. In fact, technology is a minor cause of failure relative to the real culprits. Here are the four key reasons that big data projects fail—and four key ways in which you can succeed.
Big data problem No. 1: Poor integration
Heudecker said there is one major technological problem behind big data failures, and that is integrating siloed data from multiple sources to get the insights companies want. Building connections to siloed, legacy systems are simply not easy. Integration costs are five to ten times the cost of software, he said. “The biggest problem is simple integration: How do you link multiple data sources together to get some sort of outcome? A lot go the data lake route and think if I link everything to something magic will happen. That’s not the case,” he said.
Siloed data is part of the problem. Clients have told him they pulled data from systems of record into a common environment like a data lake and couldn’t figure out what the values meant. “When you pull data into a data lake, how do you know what that number 3 means?” Heudecker asked.