"They do not necessarily have to be experts in the subject, or know how to create, run and maintain the warehouse independently, but they need to know how to inspect them and query efficiently to get their results," says Thusoo.
Data collection is an enormous undertaking, especially considering that companies tend to collect far more data than they can actually use or need. Before you can hire the right employees to help with data collection, you actually need to know what data you want to collect, says Mustafa.
But the biggest problems in data collection arise when businesses are faced with the "four V's of big data: volume, variety, velocity and veracity," says Polich. And one person can't deal with all four. For example, figuring out a strategy to deal with the velocity and volumes of data is typically an area for data engineers, rather than data scientists or data analysts says Mustafa.
And before you can even determine what skills you need for data collection, it's important to first consider your audience and customer base. Polich gives the example of a bank, which can't withstand any down time or lag in data retrieval, so companies need to hire accordingly. That might mean hiring people who have worked in similar high-stress environments, where certain aspects of data matter more than in other industries.
Alternatively, he also gives the example of a social media network, which can probably withstand a minimal amount of lag or inconsistency in data retrieval, especially if it results in cost-savings. That might mean you can hire someone with other skills that are important to your business or someone more accustomed to working in agile and innovative environments. Taking time to consider how your businesses can use data and what data you actually need to collect will help you hire the right person for the job.
Thusoo says he looks for workers who understand the intricacies of data collection, and everything that can go wrong with or taint data. "There is an old saying in computing, 'Garbage in, garbage out'. More than anything else, this applies to data. Your resume should not only show that you have worked with systems that are involved in this process, but also that you are adept at finding data quality issues and resolving them."
Having data is great, but if you can't understand what it means for your company, then it's ultimately a waste of resources. In the past, Thusoo says that it was important to find data analysts with skills in SQL and statistical and modeling tools like SAS and SPSS. But now, he says, as programming becomes more ubiquitous in the industry, and easier to learn, companies will want to look for other skills.
Sign up for Computerworld eNewsletters.