HPE plus MapR: Too much Hadoop, not enough cloud

Posted on 12-08-2019 , by: admin , in , 0 Comments

Cloud killed the fortunes of the Hadoop trinity—Cloudera, Hortonworks, and MapR—and that same cloud likely won’t rain success down on HPE, which recently acquired the business assets of MapR. While the deal promises to marry “MapR’s technology, intellectual property, and domain expertise in artificial intelligence and machine learning (AI/ML) and analytics data management” with HPE’s “Intelligent Data Platform capabilities,” the deal is devoid of the one ingredient that both companies need most: cloud.

The problem, in other words, isn’t that MapR wasn’t filled to the brim with smart folks and great technology, as Wikibon analyst James Kobielus insists. No, the problem is that MapR is still way too Hadoop-y and not nearly cloudy enough in a world filled with “fully integrated [cloud-first] offerings that have a lower cost of acquisition and are cheaper to scale,” as Diffblue CEO Mathew Lodge has said. In short, MapR may expand HPE’s data assets, but it doesn’t make HPE a cloud contender.

Why cloud matters

Yes, hybrid cloud is still a thing, and will remain so for many years to come. As much as enterprises may want to steer workloads into a cloudy future, 95 percent of IT remains firmly planted in private data centers. New workloads tend to go cloud, but there are literally decades of workloads still running on-premises.

But this hybrid world, which HPE pitches so loudly (“innovation with hybrid cloud,” “from edge to cloud,” “harness the power of data wherever it lives,” etc.), hasn’t been as big a deal in big data workloads. Part of the reason comes down to a reliance on old-school models like Hadoop, “built to be a giant single source of data,” as noted by Amalgam Insights CEO Hyoun Park. That’s a cumbersome model, especially in a world where big data is born in the cloud and wants to stay there, rather than being shipped to on-premises servers. Can you run Hadoop in the cloud? Of course. Companies like AWS do just that (Elastic MapReduce, anyone?). But arguably even Hadoop in the cloud is a losing strategy for most big data workloads, because it simply doesn’t fit the streaming data world in which we live.

And then there’s the on-premises problem. As AWS data science chief Matt Wood told me, cloud elasticity is crucial to doing data science right:

Those that go out and buy expensive infrastructure find that the problem scope and domain shift really quickly. By the time they get around to answering the original question, the business has moved on. You need an environment that is flexible and allows you to quickly respond to changing big data requirements. Your resource mix is continually evolving—if you buy infrastructure, it’s almost immediately irrelevant to your business because it’s frozen in time. It’s solving a problem you may not have or care about any more.

MapR had made efforts to move beyond its on-premises Hadoop past, but arguably too little, too late.