What is a Data Lakehouse?
by Hannah Barrett, on May 18, 2021
Ahh... a Lakehouse sounds nice...
That's because it is! While it may not be the water skiing, sun-bathing, private beach kind, it has still got a lot going for it. So what exactly is it? Let's see...
Data warehouse + Data lake = Data lakehouse, right?
Basically, yes! A data lakehouse features the best elements from both concepts. The data lakehouse serves as a single platform for data warehousing and data lake storage; its the best of both worlds! That is nice!
What a data warehouse does well
A data warehouse holds only structured data which is integrated and cleansed through a transformation layer before being loaded to a destination. Data within a warehouse must follow a defined schema.
Data warehouses are not optimal for dealing with the unstructured data, semi-structured data, variety, volume, and velocity of the data of modern enterprises today. This is one of the main areas where the data lakehouse beats the traditional data warehouse. With a data warehouse, the data is difficult to ingest because there are many steps to prepare and transform the data to make it useable for business intelligence or data analysis, but once it's in there it is easy to use.
What a data lake does well
A data lake is big and can hold any raw data, including unstructured data. The schema are optimized for operational tasks, not analytics tasks. On the flip side from a data warehouse, It's easy to get data into a data lake, but not so easy to use it once it is in there. A data lake alone is not set up to optimize analytics and reporting.
How the data lakehouse does it better
The data lakehouse takes the best features of the data warehouse and data lake, and leaves behind the parts of which each are lacking. With the new ability to derive intelligence from unstructured sources, it is no longer necessary to maintain multiple systems simultaneously, which can be time-consuming and costly; they can now all be in one location. Having to move data from system to system can get complex and cause delays. This combination of several data warehouses, a data lake, and any other systems forms the single data lakehouse which gives you one single source of truth for analytics.
Benefits of a data lakehouse:
- Time & effort savings
- Enabled use of BI tools directly on the data
- Supports a variety of data types: images, video, audio, semi-structured data, and structured data
- Support for streaming for real-time reports
- More capable of serving diverse data applications
- Simpler data ingestion and publishing
With the data lakehouse, businesses get the same BI and reporting capabilities they are used to while cost-effectively bringing all data to one place in an open format with ease. Rather than uploading raw data from the data lake into the data warehouse and transforming it, the query tool is connected directly to the data lake.
At Arkatechture, we do ELT (extract, load, transform) - Extract the data, load it as it is, then transform it as it is consumed. This means you don't have to move data around to make it more useable, because it is put into a dynamic conformance and enrichment layer.
Traditional ETL (extract, transform, load) is one of the reasons why the traditional data warehouse has such a long period of wait time before it delivers value. With ELT, calculations are done on the fly as you feed the BI tool instead of doing them before publishing; that's what enables the possibility of building a data lakehouse without the long cycle time of ETL.
At Arkatechture we are more agile in reacting to new business requirements because of how we have implemented the conformance and enrichment layer. To learn more about leveraging this technology for your business, connect with one of our data experts!