Atlassian claims it’s a step closer to achieving ‘nirvana’ with its data lake

May 28, 2021

Atlassian was always an advocate of the data warehouse-style architecture, according to the company’s data platform senior manager Rohan Dhupelia.

At one point the company was running two data warehouses. One was a PostgreSQL data warehouse that was used to power business intelligence and the company’s dashboard needs, and was typically used by finance, support, and marketing.

The second was an Amazon Redshift data warehouse for research and development.

“It was here that we shipped all of our Clickstream data from our products, and used notebooks and SQL analytics to understand the user journey and patterns through our products,” Dhupelia explained, during the keynote of the virtual Data+AI Summit 2021.

But having two data lakes did not do any favours for Atlassian as it ended up causing the company more problems.

“Primarily, we noticed that a large number of datasets were typically being copied across from one data warehouse to another. These copies were brittle and often added delays to downstream pipelines and analysis,” Dhupelia said.

Other issues the company came across included noticing that different syntaxes existed between the two data warehouses, which made it difficult to covert queries between the two, and it was becoming a costly exercise to pool data from two data warehouse together.

“As a result, a lot of analysis just didn’t happen because the engineering tax was just way too high,” Dhupelia said.

It was at that point the company re-evaluated its architecture and opted to trade its two data warehouses for a single S3 data lake architecture. While positive outcomes were achieved because of the switch, including less “engineering tax” and having the ability to scale infinitely, the performance of data lake was not up to scratch, however.

“We could manage to get relatively good concurrency with Presto, but smaller queries were still not returning as fast as they did in the data warehouse architecture. Also, modelling data for dashboards and BI use cases was quite difficult,” Dhupelia explained.

It also meant a high barrier to entry for data analytics and science use cases.

“Our data platform team was becoming the bottleneck for users wanting to do anything advanced on the platform,” Dhupelia said. “Often users had to ask us to add them to create a cluster or add particular libraries to their cluster.”

For Dhupelia, the solution was implementing Databricks into the environment, which he said has moved the company closer to achieving a “nirvana state”.

“We are now able to perform queries much faster, thanks in part due to Databricks’ optimised runtimes, but also as a result of the optimisations that came with converting tables to the data lake format. This meant an improved experience for business intelligence style use cases,” he said.

In the coming months, Atlassian plans to move more business intelligence workloads into Databricks, following recent trials of Databricks SQL.

“We’re also planning on moving more tables towards the data lake to further improve that performance, but also to simplify workloads that need strong dimensional modelling,” Dhupelia added.

“We’re looking at ways that we can enable more sensitive use cases by using Immuta, which is a self-service data access and privacy control layer on top of that data lake.

“At Atlassian, we have proven that there is no longer a need for two separate data things. Technology has advanced far enough for us to consider one single unified lake house architecture.”

Related Coverage

By ZDNet Source Link

Atlassian claims it’s a step closer to achieving ‘nirvana’ with its data lake

Related Coverage

LEAVE A REPLY Cancel reply

CYBER SECURITY NEWS

Understanding Identity Theft: How It Works and How to Protect Yourself

Understanding Online Financial Frauds and How to Stay Protected

3.6 times surge in mobile banking malware and 83% crypto phishing spike: New financial cyberthreats report by Kaspersky

Online Safety Tips and free Cyber Safety and Crimes books

The National Cyber Crime Reporting Portal

Protect your online accounts from hackers and enable 2SV

TECH NEWS

Domestic APMs are outpacing international APMs to become the primary way for emerging markets to interact in the digital...

Syneriq Global’s Hyderabad GCC – A New Era for AI Product Engineering: Sudhakar Pennam

Tap and Go: How Gen Z is Revolutionising Payment Technology

AI powers record 2024 revenue, but automotive and industrial struggles linger says Omdia

High-performance computing, with much less code

Generative and agentic AI set to transform customer service into a strategic value driver for businesses

TOP NEWS

How to Fact-Check Online: A Comprehensive Guide

FICO Data Uncovers Positive Impact Pandemic Had on UK Consumers’ Payments

Understanding Identity Theft: How It Works and How to Protect Yourself

CEOs Are Relying on Employee Productivity to Fuel Organizational Growth in 2025 and Beyond

The National Cyber Crime Reporting Portal

Over 140,000 Tonnes of CO₂ Emissions Prevented by Uplink Community in 2023-2024

TECH NEWS & UPDATES

GPU-as-a-Service: Leveling the Playing Field in the AI Hardware Market

One Third of Americans Work in STEMM Jobs Accounting for 39% of GDP, According...

NVIDIA CEO Jensen Huang Will Deliver First Keynote at COMPUTEX 2025, Sharing Latest AI...

AI in Education Report: New Cengage Group Data Shows Growing GenAI Adoption in K12...

New Research from eClinical Solutions Details Increased Focus on Risk-Based Strategies and Highlights AI...

Atlassian claims it’s a step closer to achieving ‘nirvana’ with its data lake

Related Coverage

RELATED ARTICLES

LEAVE A REPLY Cancel reply

CYBER SECURITY NEWS

TECH NEWS

TOP NEWS

TECH NEWS & UPDATES