Hemant Tiwari, Managing Director, Hitachi Vantara, India & SAARC
Economists are cautioning of an impending and long-lasting economic and market downturn as India copes with the global Covid-19 pandemic. It is imperative that financial institutions are forearmed, as they are going to need every edge they can get. While financial firms comply with myriad challenges like open banking, ensuring data privacy, statutory compliance, they need to protect their crucial data against an onslaught of cyberattacks as well.
As they strive to manage and gain full value from their data in changing times, data-rich companies like banks must look to turn their vast amounts of unstructured data into valuable assets. They should look at quickly adopting DataOps – data management methodology for the AI era to get the right data, to the right place, at the right time to monetize data, to drive innovation and gain competitive advantage.
Organisations with data lakes and Hadoop environments in place want to get the most out of existing investments, but they do need to consider storage and data management alternatives that will keep pace with changing data analytics requirements, tasks and challenges.
Data Lakes giving way to Dataops
Data scientists are working to identify specific questions to answer, business objectives to accomplish and the data sets required to get the job done. This helps to get a handle on the massive amounts of data generated and collected by businesses today, and the increasing number of data sources they’ve got to consider. Its a way to find value in the data and contribute directly to business objectives.
Data increasingly becomes less accessible as it becomes more diverse, dense and distributed. A modern dataops strategy pivots on managing, categorizing and enriching data with metadata at the place where it is captured and created instead of spending resources (time, money, effort) into a centralized repository. This begins to make the case for handling processing and storage as separate operations with separate solutions.
Separation of Data Processing & Storage
Hadoop is a data lake environment that is running into challenges related to scale for many data-rich organisations. It just isn’t an economical storage solution for the sheer volumes of data involved – such as reams of daily data from ATMs, financial transactions, customer surveys etc.
As a processing engine and a significant existing investment, Hadoop isn’t going away. But the evolution of data analytics and the storage limitations of Hadoop necessitate pulling the storage engine out of the Hadoop environment in favour of object storage.
Object Storage needs a Performance Boost
For years now, object storage has been understood to be a scalable and cost-effective solution for archival purposes and to house vast amounts of data to be accessed rarely, or where latency isn’t a concern.
As unstructured data threatens to overwhelm businesses and increasing storage costs have them looking for more economical solutions, more people are seeing object storage as part of the solution to evolving analytics needs.
Metadata is the key attribute of object storage. For every file you write into object storage it saves the file including its metadata. An object storage platform knows the context, file type and all the other machine-generated identifiers including its custom metadata, such as customer name, policy type etc, about each piece of data.
So object storage stores data in a way that makes it ready and ripe for machine learning, AI and analytics. However, object storage on its own is not a high-performance processing environment.
Hadoop + Object Storage + VSP
The super high-performance physical storage capacity, underneath the object storage, will come from the latest powerful NVMe virtual storage platforms (VSP). It uses block storage, which means it knows you’ve written blocks of data to it, and can guarantee you access, policy-based data protection and high performance for your millions of transactions, but it doesn’t know what the data is.
The object storage platform, which sits in front of the VSP, is the piece with the intelligence about the data stored within it. Because of the way it stores data, it is a high capacity storage option to service big data analytics.
For the data scientists, analysts and engineers, object storage provides the contextual access to data that lives on the back-end, high-performance block storage, which is the VSP.
Integrate that with the Hadoop environment as the analytics processing engine and the financial institution’s infrastructure is now ready for next-generation data analytics.