Data Engineering

Data Engineering for everyone – like really for everyone. In Fabric you can do low-code or no-code Data Engineering with Power Query and Data Factory. Professional developers can use Spark as a service. No clusters, no provisioning, no maintenance. Just results from the beginning. Fabric is built on top of Delta file format and it means that transforming data from source to curated is easy and making data available at any step straightforward. Fabric automatically makes all delta tables visible in Spark and SQL but also in Power BI.

Qumio has helped customers to optimize their data engineering workloads to meet time constraints and also to solve pipeline problems.

Spark as a service

No clusters. No provisioning. No waiting. Just open notebook and execute. Behind the scenes all the plumming is ready and warm. When you start executing your code it begans running instantly connections to Lakehouse are predefined. Shortcuts to AWS and Datalake Gen2 just work. You can run the language of your choice (Python, Scala, SQL or R). Create your notebooks with Jupyter notebooks or if you prefer your local IDE, You may use VSCode.

To speed up your development Data Wrangler is a gui to generate most of the transformations you would need. CoPilot helps with the rest and you are as productive as possible.

Sempy is a Fabric library available in PySpark. It makes it possible to connect Power BI datasets and consume both data and measures in your data engineering workflow.

Data Factory

Data Factory is the ETL tool for cloud. Tightly integrated to other Fabric workloads. You create pipeline with visual editor and workload is running in a distributed computing cluster with speed.

Data Pipelines are the vehicle for orchestration. You can run Notebooks, Spark jobs, PowerQuery and even single SQL-statements with ease.

PowerQuery

Everyone knows PowerQuery. What is new is the scalability and performance! Running in Fabric’s distributed cluster you can now join large tables together, do large aggregations, and do complex transformation with scale.

PowerQuery can save the final dataset into Delta table. This table is then immediately available for Power BI datasets, SQL Endpoint and Lakehouse.