Onelake

Relational or non-relational. SQL or Spark. Streaming or Batch. Binary or text. No matter how your data is born, with modern Data Warehouse your data is governed, accessible and usable! Onelake is the Onedrive for data.

With Microsoft Synapse you merge all your data to single platform. Open architecture with Delta-fileformat you can use your data with the tools you prefer! Qumio has a strong background creating scalable Onelake architectures with first customers already in production.

Lakehouse

Delta as industry standard and open format is the best choice for Lakehouse architecture. Onelake being Azure Data Lake gen2 compatible makes it ideal for data storage.

We do native apps on the platform but also help customers to get most from their existing Data Bricks investments and help you move forward to Synapse platform.

Synapse is fully integrated stack where data on the Onelake is immediately usable with all different workloads, no ETL required.

Warehouse

SQL Warehouse has been crucial part of the Enterprise analytical platform for decades. Almost every tool on the planet can connect to T-SQL endpoint. SQL Warehouse for the future is open with Delta-fileformat from the bottom and T-SQL compliant from the top!

That’s Synapse Warehouse in 2023. Scalable, familiar, compatible … Let’s transform your current investment into new performant platform.

Qumio has long history with biggest DW’s in Finland. Many of which are running in Microsoft MPP architectures. Now first of them are migrating to Synapse architecture and we are helping them to get most out of the investments done during the years.

Medallion architecture

The medallion architecture describes a series of data layers that define the quality of data stored in the Lakehouse. Bronze layer is the raw, silver validated and gold enriched. Layers describe the quality of the data. Medallion architecture makes it possible to do reloading of the data and it allows very scalable loading of the data. Medallion architecture does not replace dimensional modeling needed for efficient analytics from Gold layer. Bronze layer is very close to original data format and Silver layer varies depending the use-case and business needs.

Bronze layer

Bronze layer maintains the raw state of the data. Data is stored as close to original format as possible. Data can be coming from streaming or batch sources. Whole and complete history is saved so reloading higher layers is possible from the bronze layer.

Silver layer

Data is validated and deduplicated. Data is stored most commonly in efficient Delta-tables and powerful advanced analysis can be done from Silver layer.

Silver layer provides enterprise view which is conformed and cleansed. Silver layer combines data from multiple sources into single view and has validated relationships between entities.

Silver layer enables Data Scientists and analysts to create value and work with all enterprise data.

Silver layer has often 3rd-normal form like data model but some customers also implement this via datavault-like modelling.

Gold layer

Gold layer serves the actual business analytics. Data is aggregated and transformed to dimensional models that business users and use-cases can work with. All data is easily queried from Delta-tables. It can be source for Tabular Direct Lake models. Data is updated based on the business requirements, normally at least daily but very often near real-time.

Qumio helps customers to transform their Data Platform to modern lakehouse medallion architecture. Onelake fits perfectly into this architecture with its versatile access right and policy model. Automatic Delta-table discovery helps business users to benefit from the data as soon as they are ready on the lake.

Delta fileformat

Onelake can contain all kinds of filetypes. No matter whether it is in textual or binary format. But to be able to query data it has to be in certain binary format and in Onelake this format is Delta. Delta is based on parquet binary files + additional metadata to maintain information about files. All Data modification operations (DML) work against Delta tables. Microsoft has implemented native support to latest delta features like Delta deletion vectors. Deletion vectors is an enhancement that adds a bitmap on top of parquet file allowing deletes to happen without complete rewrite of file optimizing the performance and speed.

Onelake also has parquet optimizing technique called vertiparquet. It is Microsoft’s propriety technology to make parquet files as fast as columnstore index is at SQL Server and Analysis services.