During the past few years Neustar, an $830 million publicly-traded data analytics company, has undergone a dramatic business transformation, and it's been powered almost entirely by Hadoop.
The company has a history of providing real-time information to telecommunications and Internet providers - everything from number porting and domain name registries to supplying shorting codes, the basis of text messaging for many mobile providers.
In 2011 Neustar had the capacity to be able to track about 60 days worth of historical data, but new executive leadership challenged the company to offer more comprehensive services for customers. "We weren't just going to throw money at the problem," remembers Michael Peterson, Neustar's vice president of platforms and data architecture. The natural choice was to go open source.
Instead of scaling proprietary Oracle and IBM Netezza platforms, instead Peterson and his team turned to Hadoop. Originally Neustar techies worked with Cloudera, which offers a packaged distribution of the open source Apache Hadoop project. But then the developers really got into working in the open source world. "One thing we were trying to get away from are prepackaged vendors with proprietary stuff," he says. Hortonworks, which had just been founded months after Neustar embarked on its Hadoop journey, turned out to be what he calls the "perfect fit."
Hortonworks was born out of Yahoo in 2011 when some of the original engineers who built the search website's distributed architecture platform left to spin out a company to support the open source Hadoop project. Hortonworks stays close to the open source Apache Hadoop code base, and to Yahoo. Each new code set from the Apache project is tested by Hortonworks on Yahoo's massive 40,000-node cluster before it is released as a Hortonworks distribution. And it's garnering some attention in the tech market. Recently Hortonworks has signed on some big name partners, including Microsoft, Rackspace, Teradata and it even joined the OpenStack Foundation. The moves have legitimized not only this company, but the broader open source Hadoop movement, industry watchers say.
For Neustar, Hortonworks turned out to be a good fit. They got prepackaged open source Hadoop code, but because it was true to the trunk, they could iterate on top of it and contribute back to the open source community. Today, Neustar has a 120 node Hadoop cluster managing more than 2 petabytes of data, including the past 18 months worth of data it has collected, not just 60 days it had previously. With the new platform, Neustar now offers customers longer-term data sets, trending visualizations and historical analytics, all powered by Hadoop.
It's not just the business offerings that have transformed at Neustar - the entire IT team's culture has changed to be an open source mindset team, Peterson says. Engineers are experimenting with an OpenStack private cloud deployment now. "The whole process has fit directly into the agile way we want to do things, it's allowed us to take calculated risks and do things quickly in a way where we can see the results," he says.
Sign up for Computerworld eNewsletters.