The need to analyze transaction data is still the dominant use case influencing the selection of one analytics infrastructure platform over another, but a new report makes it clear that use cases involving data science and video are starting to play a bigger role.
Based on a survey conducted by Dresner Advisory Services of 641 decision-makers who are involved in selecting analytics applications, the 2021 Analytical Data Infrastructure Market Study found that more than 84% of respondents have analytic workloads and workflows based on transactional data sources, followed most often by Excel/CSV data (69%) and metadata (65%).
The report identifies the criteria that organizations are using to determine which analytics infrastructure platform to employ based on use cases involving business user reporting and dashboards, business user discovery and exploration, data science, and embedded analytics.
More than three-quarters of respondents (78%) cited business user reporting and dashboards as the most frequent use case for analytics infrastructure, followed by business user discovery and exploration (65%). Data science and embedded analytics were identified as high priorities by 49% and 42% of respondents, respectively.
The study suggests that while traditional use cases still dominate platform selection criteria, other needs are starting to become a larger factor. The report finds priorities also tend to shift as the type of data being analyzed changes. Embedded analytics use cases place a higher priority on “text” data types, while data science and machine learning place a higher priority upon Excel/CSV and metadata. It is also worth noting that machine and events/log data are a priority for a large number of use cases, with 50% of respondents rating this as either critical or very important. Over 20% of respondents also identified video as an important use case.
Less relevant factors in platform selection criteria are compliance or the need to adhere to a single corporate standard. The analytics infrastructure platform market overall remains highly fragmented because business units within organizations still enjoy a large amount of autonomy when it comes to selecting analytics applications and the platforms they run on.
“It’s a heterogeneous environment,” said Howard Dresner, founder and chief research officer for Dresner. “It’s not like back in the day when there was an approved list to choose from.”
In the absence of that standardization, most organizations will find themselves managing multiple data warehouses either in isolation or by trying to add a layer of semantics through which queries can be launched across multiple platforms, Dresner research director Brian Wood said.
Of course, IT organizations have added a layer of semantics across multiple databases before. The difference now is there are more cloud platforms, some of which — depending on volume and data type — can be more expensive than an on-premises platform based on how often data is accessed and whether the platform needs to be treated as a capital or operating expense, Wood added. “It comes down to utilization and volume,” he said.
The survey found that cloud services have become the preferred deployment model (52%) for consuming analytics, followed by on-premises IT environments (40%). Hybrid cloud or cross-datacenter integration and management capabilities were identified as a priority by only 32% and 26% of respondents, respectively.
In terms of licensing options, respondents showed a slight preference for concurrent user pricing (44%), with other models such as data volume, subscription, and per-user models ranging from 39 to 41%. Open source models trailed at 30%.
The one thing that all respondents have in common is the influence that performance plays in the selection of an analytics platform. Nearly all respondents ranked performance as the top criteria (83%), followed closely by security (78%), scalability (73%), and features (72%). As the type of workloads deployed on analytics platforms deployed on the same infrastructure becomes more varied, performance and scalability across multiple classes of workloads become bigger concerns, Dresner noted.
In the meantime, the most important capabilities identified by survey respondents are the ability to support aggregations, statistical analysis, and other applications based on the R programming language, as well as multi-dimensional/OLAP queries. Sentiment analysis and path/link analysis ranked lowest.
SQL data capabilities are, by far, the top data model/management priority (82%), followed by row format (51%), in-memory (51%), and columnar format (32%). Non-SQL and hierarchical files such as Hadoop are high priorities for less than 35% of respondents.
Regardless of how analytics infrastructure platforms are consumed, most IT organizations can safely assume that given the volume of different types of data their organization needs to analyze, the overall percentage of the IT budget consumed by these platforms will continue to rise for years to come.