TECH & OTHER NEWS

What’s possible in a zero-ETL future?

November 8, 2023

gettyimages-1325193198-1 — Cravetiger / Moment / Getty

This article was written by Rahul Pathak, vice president of relational database engines at AWS

Integrating data across an organization can give you a better picture of your customers, streamline your operations, and help teams make better, faster decisions. But integrating data isn’t easy.

Often, organizations gather data from different sources, using a variety of tools and systems such as data ingestion services. Data is often stored in silos, which means it has to be moved into a data lake or data warehouse before analytics, artificial intelligence (AI), or machine learning (ML) workloads can be run. And before that data is ready for analysis, it needs to be combined, cleaned, and normalized—a process otherwise known as extract, transform, load (ETL)—which can be laborious and error-prone.

At AWS, our goal is to make it easier for organizations to connect to all of their data, and to do it with the speed and agility our customers need. We’ve developed our pioneering approach to a zero-ETL future based on these goals: Break down data silos, make data integration easier, and increase the pace of your data-driven innovation.

The problem with ETL

Combining data from different sources can be like moving a pile of gravel from one place to another— it’s difficult, time-consuming, and often unsatisfying work. First, ETL frequently requires data engineers to write custom code. Then, DevOps engineers or IT administrators have to deploy and manage the infrastructure to make sure the data pipelines scale. And when the data sources change, the data engineers have to manually change their code and deploy it again.

Furthermore, when data engineers run into issues, such as data replication lag, breaking schema updates, and data inconsistency between the sources and destinations, they have to spend time and resources debugging and repairing the data pipelines. While the data is being prepared—a process that can take days—data analysts can’t run interactive analyses or build dashboards, data scientists can’t build ML models or run predictions, and end users, such as supply chain managers, can’t make data-driven decisions.

gettyimages-1479882152-1 — Maxxa Satori / iStock / Getty Images Plus

This lengthy process kills the opportunity for any real-time use cases, such as assigning drivers to routes based on traffic conditions, placing online ads, or providing train status updates to passengers. In these scenarios, the chance to improve customer experiences or address new business prospects can be lost.

Getting to value faster

Zero-ETL enables querying data in place through federated queries and automates moving data from source to target with zero effort. This means you can do things like run analytics on transactional data in near real-time, connect to data in software applications, and generate ML predictions from within data stores to gain business insights faster, rather than having to move the data to a ML tool. You can also query multiple data sources across databases, data warehouses, and data lakes without having to move the data. To accomplish these tasks, we’ve built a variety of zero-ETL integrations between our services to address many different use cases.

For example, let’s say a global manufacturing company with factories in a dozen countries uses a cluster of databases to store order and inventory data in each of those countries. To get a real-time view of all the orders and inventory, the company has to build individual data pipelines between each of the clusters to a central data warehouse to query across the combined data set. To do this, the data integration team has to write code to connect to 12 different clusters and manage and test 12 production pipelines. After the team deploys the code, it has to constantly monitor and scale the pipelines to optimize performance, and when anything changes, they have to make updates in 12 different places. By using the Amazon Aurora zero-ETL integration with Amazon Redshift, the data integration team can eliminate the work of building and managing custom data pipelines.

Another example would be a sales and operations manager looking for where the company’s sales team should focus its efforts. Using Amazon AppFlow, a fully managed no-code integration service, a data analyst can ingest sales opportunity records from Salesforce into Amazon Redshift and combine it with data from different sources such as billing systems, ERP, and marketing databases. Analyzing data from all these systems to do sales analysis, the sales manager is able to update the sales dashboard seamlessly and orient the team to the right sales opportunities.

Case study: Magellan Rx Management

In one real-world use case, Magellan Rx Management (now part of Prime Therapeutics). has used data and analytics to deliver clinical solutions that improve patient care, optimize costs, and improve outcomes. The company develops and delivers these analytics via its MRx Predict solution which uses a variety of data, including pharmacy and medical claims and census data, to optimize the predictive model development and deployment as well as maximize predictive accuracy.

Before Magellan Rx Management began using Redshift ML, its data scientists arrived at a prediction by going through a series of steps using various tools. They had to identify the appropriate ML algorithms in SageMaker or use Amazon SageMaker Autopilot, export the data from the data warehouse, and prepare the training data to work with these models. When the model was deployed, the scientists went through various iterations with new data for making predictions (also known as inference). This involved moving data back and forth between Amazon Redshift and SageMaker through a series of manual steps.

With Redshift ML, the company’s analysts can classify new drugs to market by creating and using ML models with minimal effort. The efficiency gained through leveraging Redshift ML to support this process has improved productivity, optimized resources, and generated a high degree of predictive accuracy.

Integrated services bring us closer to zero-ETL

Our mission is to make it easy for customers to get the most value from their data, and integrated services are key to this process. That’s why we’re building towards a zero-ETL future, today. With data engineers free to focus on creating value from the data, organizations can accelerate their use of data to streamline operations and drive business growth. Learn more about AWS’s zero-ETL future and how you can unlock the power of all your data.

Source Link

What’s possible in a zero-ETL future?

The problem with ETL

Getting to value faster

Case study: Magellan Rx Management

Integrated services bring us closer to zero-ETL

LEAVE A REPLY Cancel reply

TECH NEWS

Everything Old is New Again: AI-Driven Development and Open Source

Gen AI in Healthcare: The State of Affairs in India

Gartner Predicts Legal, Risk and Compliance Functions to Double Technology Spend...

Microsoft to End Support for Windows Mail, Calendar and People Apps...

IDC Predicts: Asia/Pacific Business Leaders to Demand 80% Success Rate on...

The Cooling Conundrum: AI and Automation Push Data Centers Toward 3X...

TOP STORIES

Seventy Percent of Economies Are Underprepared for AI Disruption

New study shows almost half of tech professionals in India believe...

Organizations Combining Organizational Learning and AI-Specific Learning Are up to 80%...

Nvidia’s AI-driven triumph over Intel powered by strategic innovations

Most banks and insurers adopt cloud solutions with the primary objective...

India’s Web3 Ecosystem Has Over 400 Firms, Karnataka Emerges as Industry...

Cyber Security

AI and Gen AI are set to transform cybersecurity for most...

ThreatQuotient Publishes 2024 Evolution of Cybersecurity Automation Adoption Research Report

Kaspersky predicts quantum-proof ransomware and advancements in mobile financial cyberthreats in...

Rising concerns, lingering gaps: most organizations fear AI-driven cyberattacks but lack...

Tenable Forecasts Data Security in the Cloud to Take Centre Stage...

Blockchain-Enhanced Cybersecurity-Safeguarding Digital Identities and Data