COVID-19 has changed the questions. Dashboards built on previous assumptions were no longer valid. Instead, decision-makers and business leaders were sieving through HR and operational data to understand their current predicament.
In a recent report, “Top 10 Trends in Data and Analytics, 2020,” Gartner predicted that this has given rise to analytics adoption. Not just plain vanilla version, but the augmented variety.
Augmented analytics brought together previously separate worlds of data and analytics into a single platform. Vendors like SAS offer end-to-end workflows to drive augmented analytics and blur the distinction between these two markets.
This is good news for data scientists. The so-called “collision of data and analytics” will increase interaction and collaboration between historically separate data and analytics roles. Gartner noted that this impacts not only the technologies and capabilities provided but also the people and processes that support and use them.
Remco den Heijer, vice president for ASEAN at SAS, said that augmented analytics is already helping a lot of companies to optimize decision-making as insights from data were “available in a fraction of the time compared to manual approaches.”
Soon, augmented analytics will improve demand forecasts, identify potential supply-chain disruptions, ready support services to at-risk workers, and determine the effectiveness of crisis-intervention strategies are some of them.
Rethinking data preparation
Data is the lifeblood of an organization. Bad data can also give it a heart attack. And many companies, starving in insights, are learning it the hard way.
It is not a new concern and one that data scientists only know too well. “The concern about the quality of data is nothing new. For as long as organizations have been collecting and using data, there had been concerns with whether they have been recorded and processed properly. Good data does cost money and require considerable time and effort,” observed den Heijer.
But with companies relying on analytics insights more, the risk of bad data sets or ones with inherent bias is greater. And as companies explore new data sources to better grasp a fast-evolving market, they also need to a better way to verify data sources quickly.
In turn, this has put the spotlight on data labeling and cleansing. “And the incentives for proper data labeling and cleansing must be tied as directly to those charged with its creation,” den Heijer said.
Instead of looking at data labeling and cleansing as separate processes, he argued that companies need to see it as part of information production and continually monitor and adjust as needed. “Companies should weigh the benefits of doing so and spending extra time and money on obtaining clean data sources.”
Data virtualization can also simplify the complexity of accessing highly distributed data while offering a centralized, flexible, high-performance analytics environment.
“It applies data quality functions such as parsing, matching, and gender or identification analysis in real-time as the view is generated. By providing a data virtualization layer, it helps organizations to access underlying data sources such as Hadoop, Netezza, SAP HANA, etc. There is no need to create separate access strategies for each data source,” said den Heijer.
AI and augmented data management
COVID-19 has shortened the journey to AI. This is why Gartner believes in its report that “by the end of 2024, 75% of organizations will shift from piloting to operationalizing AI.”
Before, many of the cases were Narrow AI projects (e.g., chatbots). Today, companies are exploring deep learning techniques to improve their decision making across the organization and look at broader AI implementations.
But den Heijer warned against jumping on the AI bandwagon too fast. Instead, he advised setting up a solid data governance framework first. And one way to create one is to integrate data management capabilities that allow data scientists to easily access and integrate data from other sources.
Integrating data management is what Gartner calls as augmented data management. In the report, the analyst firm defined it as one that “uses ML and AI techniques to optimize and improve [data] operations.” It can also convert metadata from being used in auditing, lineage, and reporting to “powering dynamic systems.”
The key benefits of augmented data management are the ability to analyze vast swathes of operational data, including actual queries, performance data, and schemas. Using existing usage and workload data, an augmented engine can tune operations and optimize configuration, security, and performance.
“A suggestion engine can simplify data prep. Data governance tracks data and model lineage, so if data changes, you will know which models need retraining,” said den Heijer.
Operationalizing AI democracy
SAS sees a future where citizen data scientists play a more crucial role in decision making. It has to happen as companies continually face AI talent shortages.
Den Heijer feels companies need to first change their mindsets about citizen data scientists. “Organizations in Asia are not entirely ready for this transition with citizen data scientists making business decisions on their own. Their role is currently focused on interpreting and managing data to solve complex problems. At this point, organizational structures will get in the way of citizen data scientists,” he said.
One chief data scientist, who participated in a recent survey by SAS, said that the best-case scenario for adopting AI and delivering a positive long-term impact requires “balancing leadership support and grassroots enthusiasm.”
“We understand the apprehensions of organizations when it comes to embracing the concept of citizen data scientists. With the help of cloud-enabled, in-memory analytics engine of SAS Viya, we have leveraged cloud and container technology to facilitate the consumerization of the technology.”
—
By Winston Thomas