Like children, artificial intelligence needs proper parenting to achieve its full potential, and proper parenting starts with a healthy diet — of good data.
Businesses increasingly acknowledge the potential of A.I. to accelerate decision making, but many have serious concerns about what is happening inside the black box. The quality of any A.I. can only be as good as the data it processes. Of course, “garbage in, garbage out” has long been an analytics refrain, but it’s even more important for A.I.
Why? Consider the difference between the two. An analytics solution typically provides a graph prioritizing the results. Ask an analytics program why sales are down in the Northeast region, and you’ll essentially get a list of possible factors: supply chain hiccups, demographic changes, social media trends, etc. A human then has to evaluate the results to determine which factors to base the ultimate decision on. A cognitive A.I. approach is less transparent. Ask an A.I. why sales are down in the Northeast region and you get a single, definitive answer. That’s it. Done deal.
The A.I. approach would be a business user’s dream come true. Ask a question, get a definitive answer, and confidently take an action. It would save time and result in faster, better business decisions.
But what if the A.I. is wrong? More important, how would a business user ever know an A.I. is wrong? Because of this, relying on A.I. requires a level of trust significantly higher than for an analytics solution. From the perspective of a chief data officer or a data scientist, parenting an A.I. is a humbling responsibility.
Those in charge of feeding A.I. must ensure a complete and healthy diet: clean, relevant and reliable data with traceable provenance. Instead of the five food groups, a healthy A.I. diet depends on curating the data that goes into it:
A.I. shouldn’t be allowed to drink wildly from a data lake where data has not been cleansed, packaged and structured for easy consumption.According to the Compliance, Governance and Oversight Counsel (CGOC), nearly 70% of the data that companies produce and collect has no business, legal or compliance value, so you must develop a way to understand and specify the scope and criteria of the data to be fed to A.I. Which data stores and what file types? What connections exist between the data? Who is responsible for making the determination and for final approval?
Reviewing and managing sources
Once you have specified your sources, you need to ensure the quality of the data. To increase confidence in and defend responses from A.I., you must be able to assess the authenticity (via audit trails), accuracy and value of the content contributed to your data collection. This can be done through heat maps and visualizations.
Sign up for Computerworld eNewsletters.