Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

DIY vs. fully integrated Hadoop – What’s best for your organisation?

By Russ Weeks, System Architect, PHEMI | Jan. 4, 2017
The trade-offs of building it yourself vs. going with a pre-integrated, out-of-the-box platform

Businesses harbor big data desires, but lack know-how

This vendor-written tech primer has been edited by Network World to eliminate product promotion, but readers should note it will likely favor the submitter’s approach.

You don’t have to look far to see the amazing things that organizations are doing with big data technology: pulling information from past transactions, social media and other sources to develop 360-degree views of their customers. Analyzing thousands of processes to identify causes of breakdowns and inefficiencies. Bringing together disparate data sources to uncover connections that were never recognized before. 

All of these innovations, and many more, are possible when you can collect information from across your organization and apply data science to it. But if you’re ready to make the jump to big data, you face a stark choice: should you use a pre-integrated “out-of-the-box” platform? Or should you download open-source Hadoop software and build your own?

Which path is right for your organization? Let’s take a closer look.   

Assembling puzzle pieces

First, know that if you go DIY, there are many different components you’ll need to integrate with stock Hadoop: Hive, Yarn, MapReduce, and many more. (One of the leading Hadoop distributions includes 23 different software packages.) You’ll need to figure out which components—and which software versions—make sense for your deployment, and how to make them work together and with your environment.

That’s not a one-time job; all of those tools are constantly updated, so you’ll need to figure out how to support and maintain your solution on an ongoing basis. For these reasons, most organizations building their own platforms use third-party professional services to handle much of the heavy lifting.

So why choose the DIY path? You do end up with a solution that’s precisely tuned for what you want to do with it. Your IT department retains total control over the platform’s processes and capabilities. If you’re looking at a relatively small project (designed for a specific purpose, with specific data choices and interfaces) this can be a great choice. However, there can also be a downside to extensive customization: if you want to expand your platform in the future, it may be less flexible than a ready-made solution designed for multiple use cases.

Weighing costs

It can be tempting to assume that building your own platform, using off-the-shelf hardware and open-source software, is inherently less expensive than a pre-integrated solution. The numbers, however, don’t necessarily bear that out.

The sticker price of an integrated platform may be higher, but total cost of ownership is likely to be comparable, or even lower over the life of the solution than a DIY cluster. Consider: Any big data platform will require the same compute power, storage, and infrastructure, so hardware costs are likely comparable. But, if you’re going DIY, you should expect to spend several hundred thousand dollars on software, as well as installation and ongoing support from third-party professional services, all of which is included in a pre-integrated solution.

 

1  2  3  Next Page 

Sign up for Computerworld eNewsletters.