Datakin launches OpenLineage initiative for data lineage industry standards

December 18, 2020

Datakin is announcing OpenLineage, its initiative to define industry standards for data lineage, at the Open Core Summit today. OpenLineage’s end-to-end management intends to make data operations more efficient and trustworthy for enterprises.

Datakin is developing OpenLineage in collaboration with contributors to other open source projects, including Admunsen, DataHub, Pandas, and Spark.

Data lineage is the flow of data over time across an ecosystem; as the foundation for data operations, it provides visibility into data’s origin, inventory, and availability. And data technologies are growing in both number and complexity. Its use cases have evolved in recent years to be increasingly analytical and operational, as opposed to only analytical, making data more central to enterprises’ products. For example, targeted advertisements rely on data models for personalized recommendations.

But the existing tools to handle growing data technologies can be inefficient and untrustworthy, limiting data availability and quality. Data-driven decisions become difficult to make following disruptions like models collecting the wrong data or dashboards breaking. When data processing is more observable, problems can more quickly be identified and fixed.

OpenLineage wants to meet this need by building an end-to-end data management layer. This approach would include data catalogs, comprehensive operational tools, access control for data privacy, and governance and compliance solutions to more easily analyze and collect lineage metadata.

In an interview with VentureBeat, Datakin chief technology officer Julien Le Dem described OpenLineage’s mission to get industry players on the same page. “One goal is, OK, let’s share this effort on those integrations and reuse that independently of the use case, whether it’s governance or privacy or operations. We all need the same data,” he said.

When new processing systems such as Apache Spark are released, they may break and then rely on integration to extract their metadata. Le Dem’s second goal is to fix this issue by making projects first depend on a data lineage standard. “Let’s flip the dependency … that [standard] becomes core to Spark and core to the data warehouse to actually stay in sync and whenever they change something to keep respecting the standard,” Le Dem said. “And now that we inverted this dependency … we are making it a lot more robust.”

OpenLineage could reduce fragmentation and duplication of data lineage efforts across enterprises. Its flexible, industry-wide standards could help enterprises guarantee their metadata’s consistency and compatibility.

The initiative aims to help capture information for the data pipeline when it is running. In the interview with VentureBeat, Datakin CEO Laurent Paris compared it to how pictures on smartphones include as much information as possible, such as GPS coordinates, and that metadata enables observability and allows other tools like computers to show pictures on a map.

According to Paris, “You can have an ecosystem of tools that use that information to process your picture, but it’s only possible because everybody agreed on one single spot on how to express the metadata about the picture, and that’s what we’re trying to do.”

OpenLineage follows observability efforts like Open Telemetry and Open Tracing. It applies a similar community-oriented concept to the data processing ecosystem. “It’s really getting all together, planting the seed, starting defining the models … and this is the type of project where the more people contribute to it, the more everybody gets out of it,” Le Dem said.

By VentureBeat Source Link

Datakin launches OpenLineage initiative for data lineage industry standards

LEAVE A REPLY Cancel reply

CYBER SECURITY NEWS

Understanding Online Financial Frauds and How to Stay Protected

3.6 times surge in mobile banking malware and 83% crypto phishing spike: New financial cyberthreats report by Kaspersky

Online Safety Tips and free Cyber Safety and Crimes books

The National Cyber Crime Reporting Portal

Protect your online accounts from hackers and enable 2SV

Gartner Identifies Top Commercial Threats Facing Sales Leaders in 2025

TECH NEWS

AI powers record 2024 revenue, but automotive and industrial struggles linger says Omdia

High-performance computing, with much less code

Generative and agentic AI set to transform customer service into a strategic value driver for businesses

Generative AI and Machine Learning Set for Continued Investment

Gartner Identifies Top Supply Chain Technology Trends for 2025

Tech CEOs Must Take Several Mitigating Actions to Address Pitfalls

TOP NEWS

CEOs Are Relying on Employee Productivity to Fuel Organizational Growth in 2025 and Beyond

The National Cyber Crime Reporting Portal

Over 140,000 Tonnes of CO₂ Emissions Prevented by Uplink Community in 2023-2024

The Art and Science of Cryptography: Securing the Digital World

Automotive dealers need to adapt to technological advancements to remain competitive, says GlobalData

Cryptocurrency Scams: Understanding the Risks and How to Stay Safe

TECH NEWS & UPDATES

Understanding Online Financial Frauds and How to Stay Protected

Network security market grows 5.1% year-over-year in Q4 2024, Omdia reports

RCS Interactions in India Skyrocket by 850% in 2024: Infobip Report

The Power List: Indian Entrepreneurs Driving Innovation

46% of UK businesses are embarrassed by their website despite spending an average of...

Datakin launches OpenLineage initiative for data lineage industry standards

RELATED ARTICLES

LEAVE A REPLY Cancel reply

CYBER SECURITY NEWS

TECH NEWS

TOP NEWS

TECH NEWS & UPDATES