Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Big data and business intelligence trends 2017: machine learning, data lakes and Hadoop vs Spark

Scott Carey | Jan. 3, 2017
Looking ahead to the big trends set to shape the way organisations manage their data in 2017.

As we sit on the cusp of 2017 we are still talking about organisations finally "operationalising" their data, namely: putting useful, actionable data into the hands of business users where and when they need it.

As the cost of data storage continues to fall and the availability of pre-wrapped SaaS analytics solutions proliferates, the opportunity to put insights into the hands of employees has never been cheaper or easier.

Here are some of the trends we are seeing on the horizon for 2017 in big data, analytics and business intelligence (BI).

Embracing machine learning

Analyst firm Ovum says machine learning will be the "biggest disruptor for big data analytics in 2017."

Tony Baer's Big Data trends report states: "Machine learning, which has garnered its share of hype, will continue to grow; but in most cases, machine learning will be embedded in applications and services rather than custom-developed because few organisations outside the Global 2000 (or digital online businesses) will have data scientists on their staff."

Vendors now sell pre-packaged that make it easier than ever for organisations to apply machine learning to data sets - so we can expect to see businesses continue to take advantage of predictive analytics, customer insight and personalisation, recommendation engines, fraud and threat detection.

Moving beyond Hadoop

The open source data storage solution Apache Hadoop has been the talk of the BI industry for the past few years, but viable alternatives are starting to come through alongside the popular framework, in particular Apache Spark.

The in-memory data processing engine has been hyped for some years now but as Baer notes in his report, the ability to deploy Spark in the cloud is driving adoption. He states: "The availability of cloud-based Spark and related machine learning and IoT services will provide alternatives for enterprises considering Hadoop."

Although closely related, Spark and Hadoop are different products, and Baer notes there are pros and cons to both: "The debate rages because, if you eliminate the overhead of a general purpose data-processing and storage engine (and in Hadoop's case, YARN), Spark should run far more efficiently. The drawback, however, is that standalone Spark clusters lack the security or data governance features of Hadoop."

Data visualisation specialists Tableau added that late Hadoop adopters can take advantage of self-service data preparation tools to get in on the action in 2017. The vendor said: "Self-service data prep tools not only allow Hadoop data to be prepped at source but also present the data available as snapshots for faster and easier exploration. We've seen a host of innovation in this space from companies focused on end-user data prep for big data such as Alteryx, Trifacta, and Paxata."

 

1  2  3  Next Page 

Sign up for Computerworld eNewsletters.