The difference between data hubs, data lakes and data warehouses and how to use them effectively in your organization.
Data hubs, data lakes, and data warehouses are all significant areas of investment for data and analytics leaders to support increasingly complex, diverse and distributed data workloads. Gartner research found that 57% of data and analytics leaders are investing in data warehouses, 46% are using data hubs and 39% are using data lakes.
“Data hubs, data lakes and data warehouses are not interchangeable alternatives”
While data and analytics leaders are familiar with these terms and hear about them from technology providers, many don’t understand the differences. “Data hubs, data lakes and data warehouses are not interchangeable alternatives,” says Ted Friedman, Distinguished VP Analyst, Gartner.
Friedman adds that data and analytics leaders must understand the purpose of these three types of structures, and the role they can play together in a modern data management infrastructure to best support specific business requirements.
Data warehouses versus data lakes versus data hubs
Data warehouses store well-known and structured data. They support predefined and repeatable analytics needs that can be scaled across many users in the organization. Data warehouses are suited to complex queries, high levels of concurrent access and stringent performance requirements.
Data lakes collect unrefined data (that is, data in its native form, with limited transformation and quality assurance) and events captured from a diverse array of source systems. Data lakes usually support data preparation, exploratory analysis and data science activities.
Data hubs are conceptual, logical and physical “hubs” for mediating semantics (in support of governance and sharing data) between centrally managed (i.e., widely used) and locally managed data (typically single-use data). They enable the seamless flow and governance of data.
Recognize how they differ in focus
Data warehouses and data lakes have a common focus — supporting the analytic needs of the organization. In contrast, data hubs are not focused on the analytical use of data. They do not store detailed data for extended periods.
They enable data sharing and apply governance controls to the data flowing across the organization’s various applications and processes. For example, data and analytics leaders can use a data hub to improve delivery of data form business applications to a data warehouse or a data lake.
The 3 structures are best used in combination
While it is important to understand their different roles in the architecture, data and analytics leaders must recognize the value these structures bring to the organization when used in combination.
For example, data can be delivered to analytic structures (data warehouses and data lakes) through a data hub, which acts as a point of mediation and governance. An increasing number of organizations are applying a data hub architecture as a focal point for sharing and governance of all critical data across the business; for example, replacing point-to-point integrations with a more focused architecture for synchronizing critical data between various operational applications and processes.
“The choice of data warehouse, data lake and data hub is not an “or,” says Friedman. “Modern data management infrastructure needs to be dynamic — to evolve architectural patterns over time, enable new connections and support diverse use cases.”