Both Deutsche Bank and HMRC are struggling to find a way to unravel data from legacy systems to allow integration with newly created big data systems based on Hadoop technology.
Zhiwei Jiang, global head of accounting and finance IT at Deutsche Bank, was speaking this week at a Cloudera roundtable discussion on big data. He said that the bank has embarked on a project to analyse large amounts of unstructured data, but is yet to understand how to make the Hadoop system work with legacy IBM mainframes and Oracle databases.
"We have been working with Cloudera since the beginning of last year, where for the next two years I am on a mission to collect as much data as possible into a data reservoir," said Jiang.
Deutsche Bank is collecting data from the front end (trading data), the middle (operations data) and the back end (finance data). However, Jiang was keen to highlight the challenges faced by a traditional banking IT system.
"At the end of the day we still have a huge installation of IBM mainframes and hundreds of millions of pounds of investment with Oracle. What do we do with that? We have 46 data warehouses, which all have terabytes and petabytes of storage, where there is 90 percent overlap of data. What do we do with that?" he said.
"Nobody has the skills to unravel the old technology. I've dedicated my career to making this Cloudera project work, but if it doesn't work I'll probably be out of a job."
He added: "It's very hard to unravel all these data warehouses that have been built over the last 20 to thirty years. We need to extract the data out, streamline it, build the traceability and lineage - it's very expensive to do."
Richard Brown, BIM GSL programme leader at Capgemini, also at the event, said that he was aware of similar difficulties facing HM Revenue and Customs, where the government department is looking to use big data to fight tax avoidance and detect fraud. Capgemini is the lead on HMRC's ASPIRE IT services contract, which cover's a significant amount of the department's IT operations.
"The problem isn't solved at HMRC. The analytics at the moment is running on the older technology. I think in most instances we are seeing companies sitting the Hadoop technology alongside existing systems," said Brown.
"With a new environment organisations can explore some new subject areas that they haven't looked at before. People haven't really got to the next phase of understanding how to migrate the old environments across."
He added: "Virtually all of the Hadoop installations we are seeing are organisations with new business problems, or new opportunities they have identified - using new datasets they can play with. That challenge is linking it back into the existing information sets."
Sign up for Computerworld eNewsletters.