Microsoft Fabric — Do I need it?

Microsoft Fabric – Do I need it? – by @Maren Egbert

If you use Microsoft Power BI in any fashion you probably already stumbled over it: Microsoft Fabric. Since 2023 Microsoft Fabric is available for testing and purchasing. But what is Microsoft Fabric and is it something I need? In this Blog Post I’d like to share my recherche results and give you an overview of the possibilities of Microsoft Fabric. First of all:

What is Microsoft Fabric?

Let us ask Microsoft itself:

“Microsoft Fabric is an end-to-end analytics and data platform designed for enterprises that require a unified solution. It encompasses data movement, processing, ingestion, transformation, real time event routing, and report building” [1]

Hold on, does this mean, we only need one platform for all data related processes in our company? That sounds like something we need to have a closer look at!

With Power BI Microsoft has one of the most popular Self Service BI Tools on the market. [2] And in the Azure Universe you can find a vast variety of data processing tools in the cloud — like Azure Data Lake Storage, Azure Data Factory or Azure Synapse Analytics. [3] Maybe it is an obvious next step to combine all those functionalities under one umbrella?!

Microsoft Fabric is a SaaS Solution that is supposed to simplify and unify your analytics requirements.

If you are familiar with Power BI you’ll be right at home with Fabric, because the User Interface is the same as the one of Power BI Service. From here you can easily navigate between all the different interfaces that are part of Fabric:

The following interfaces are available:

Power BI — the well-known self-service BI tool
Data Factory — import, prepare and transform data with Power Query and Pipelines
Data Activator — create alerts and actions based on your data
Industry Solutions — data solutions tailored to specific industries
Real Time Intelligence — import and analyse event-driven scenarios, streaming data and data logs
Synapse Data Engineering — ingest, prepare and transform data in a lakehouse using Spark
Synapse Data Science — built, deploy and operationalize machine learning models
Synapse Data Warehouse — ingest, prepare and transform data in a data warehouse using SQL

microsoft fabric platform — **Fig. 1: The Overview about Microsoft Fabric Tools easily reached from the Power BI Service Interface.**

Wherever you come from — Data Engineering, Data Analytics (like me) or Data Science you find your respective Interface that supports the data processing method you are familiar with. You can build semantic models (formally known as datasets) by using data flows and Power Query. You can build data warehouses in star or snowflake schemas using SQL. Or you can orchestrate complex data manipulation with Spark in lakehouses. And you can even use Git to versionize your code. [4] [5]

The fundamental ideas behind all this:

Democratisation of data: enable business units to manage and work independently with their data
Avoid complex data infrastructures with lots of data movements and duplications
Simplifying the integration of AI solutions in the data landscape

The business units in the company get the responsibility for their data — that makes sense, since they should be the ones with the most knowledge about them. Everybody, regardless of programming skills, can retrieve insights from the data and can contribute to the data landscape in the company. At the same time all data are stored in a single data lake for the entire company [6]. The idea is not completely new, since other tools already follow the “zero copy cloning” policy and the data mesh architecture was created to decentralize their data management and thereby avoid the bottlenecks of a monolithic architecture approach with centralizes responsibilities. But Fabric makes the realization of those concepts more feasible for everybody.

That leads us to the next question:

How does it work?

At the centre of all this stands the “OneLake”, a data lake solution based on Azure Data Lake Storage (ADLS) Gen2 that provides storage for structured as well as unstructured data.

one lake data lake solution based on azure. Microsoft Fabric — **Fig. 2: Microsoft Fabric Functionalities are all based on OneLake.** Quelle: *https://blog.fabric.microsoft.com/en-us/blog/microsoft-onelake-in-fabric-the-onedrive-for-data?ft=Onelake:category*

OneLake follows the same concept as OneDrive. OneLake is therefore also called the “OneDrive for Data”. Instead of Word, Excel or Power Point files in OneDrive Folders you store data items (like lakehouses and data warehouses) in OneLake workspaces. Anybody, that is granted access to the workspace can use and provide content in this secured space. Thereby access policies and ownership can be governed separately — again without copying any data. Like sharing your Excel file on OneDrive you can share your data warehouse in Fabric, enabling your coworkers to work with the data according to the rights you give them. Even sharing between Fabric tenants is available for public preview, now [7] [8] [9].

Let’s look at an example every data analyst will have to deal with on a regular basis: For a report a new dimension is needed, that is provided in an excel spreadsheet. Not using Fabric, you would connect the Excel spreadsheet to your semantic model in Power Bi and add those dimensions into the report. Problem solved for the moment. But with fabric those dimensions can be added to the data lake directly and therefore be part of the company wide data landscape. No need to wait for a data engineer to integrate it into the data infrastructure, that’s already done. There are several ways to import data to a lakehouse in fabric [10]:

Upload the file from the local computer (small file upload)
Create a pipeline (large data source)
Create a dataflow (small data source)
Use Apache Spark in notebook code (complex data source)

All tabular data are automatically saved in delta parquet format and every tool can interact seamlessly with this format. Transformations and translations between different tools are not necessary.

Let’s look at one example that always bugged me: The semantic model modes in Power BI:

Power BI used to have essentially two modes: import and direct query. Both modes have their advantages but also disadvantages. The default import mode leads to duplications of data sources, since all tables are copied into the report. That could lead to differences in data actuality between the reports and slow report loading. The alternative, direct query, is the mode of choice by big data sources and real time approaches. But necessary adaptions between the different applications (e.g. translation from DAX/M to SQL) could slow down the queries and not all features of Power BI are available in Direct Query Mode. The new option “direct lake” does not need any translations, because power Bi can directly work with delta parquet files and you do not need to load all your data into your report and therefore create copies or have outdated data. [11] [12]

semantic model modes in Power BI. Microsoft Fabric — **Fig. 3 Overview about the semantic model modes in Power Bi** (https://learn.microsoft.com/en-us/fabric/get-started/direct-lake-overview)

Direct Lake finally is a possibility to create lean, fast reports, yeah!

So, it’s a quite interesting product, that’s supposed to accelerate my daily work. But there is one question, that will surely raise, if I want to use Microsoft Fabric:

How much does it cost?

Costs are obviously depending on the amount of storage and capacity units you need, the number of users that contribute or exploit content, if you want to pay as you go or have reserved capacities and in which region you want to store your data.

fabric capacities and prices. Microsoft Fabric — **Fig. 4: Overview Fabric Capacities and prices** (Region Germany West Central) *https://azure.microsoft.com/en-us/pricing/details/microsoft-fabric/*

The available Fabric capacities range between F2 and F 2048, not surprisingly, 2 respectively 2048 capacity units. For classification: the smallest Power BI Premium capacity corresponds to a F32 capacity. Another speciality of F32: From this level on free Power Bi Licences are available for all employees who only need to consume reports and interact with them. You only need Power Bi Pro Licences to contribute reports. In smaller capacities you need Power BI Pro Licences for all employees, that want to do anything with Power BI.

Let’s look at some examples [13] to provide an idea about the cost range and key cost drivers:

examples of fabric prices to illustrate cost range. Microsoft Fabric — Figure 5: Some theoretical examples of fabric prices to illustrate cost range. Price calculations are based on Microsoft Azure Price Calculator and our best understanding. Note that actual prices may vary.

The price of fabric consists of three components:

Compute (Fabric instances), storage (OneLake) and Power BI Pro Licences. While no free OneLake storage is provided, you have free Mirroring Capacity included, depending on your capacity (e.g. you have 64 TB Mirroring included in a F64 capacity). Mirroring is the replication of existing databases and data warehouses that are continuously synchronized in OneLake in near real time. You only pay for storage as soon as your free capacity is exceeded. Thus, if you already have a specialized Data Infrastructure, you do not need to pay again to use it in Fabric.

In any case, the Power BI Pro Licences needed, have only a minor impact on total costs compared with the Fabric capacity. Here, the difference between pay as you go, and reserved capacity has some impact with around 40 %. Therefore, pay as you go only makes sense if you can shut down you Fabric Capacities for at least half the time.

Anyway, if you’re new to Big Data and Business Intelligence, Fabric offers access to modern data concepts for just round 200 $ per month (F2 Capacity, 365 h, 1 TB). And if the idea takes root, it can be easily adjusted to the growing interest.

So far so good… let’s have a look at it in practice:

What’s the feeling?

I have to admit, as a data analyst I really got exited about all those possibilities:

Easily write my own python script to analyse the raw data. Just change the interface and check the data in the data warehouse that is the foundation of my report with SQL. Easily add dimensional data to my data warehouse without dealing with overworked data engineers, having my ticket end up in some future sprint. No need to load all your data into your Power BI report and create a huge file that takes a long time to refresh in the best case or has outdated data in the worst. That was something I really wanted to try!

If you want to try Fabric the trial and the Microsoft learn path for fabric is great. You have a lot of information and training material. You can even load sample data directly in the Fabric interface and build lakehouses and warehouses out of scratch.

But we are still in a Microsoft environment. Is it just me, or all the different possibilities to reach the same goal really redundant and superfluous? There is a lot of drag and drop, clicking on a button or selecting a function in a drop-down menu, that makes the experience most redundant. You can even write a SQL query by drag and drop (or by clicking another tab write it in an editor). But next to the redundance, navigating between the interfaces is really easy, and you are rapidly at home in Fabric.

The performance of Fabric — even with the optimized sample use cases — was not satisfactory. The initial load and creation of the lakehouse or data warehouse environment took quite some time. At this point doubts occurred, if Fabric is in fact able to ingest, transform and analyse Petabyte of data with acceptable performance as Microsoft claims [14].

Additionally, the idea of one drive for data and the possibility to create your own data content that is integrated into the data lake, probably results in nightmares for everyone who is responsible for data security and data quality. How to avoid huge data graveyards? How to avoid, that every department works with their own datasets and data interpretation? Is it really sensible to allow everybody to mess around with the data? I think it’s worth the try! Data only produce value if they are used. But the democratization of data has its risks and requires some well thought out processes and best practices to ensure a decent data quality. Fabric offers some support with functions like “data promotion” and “data certification” that indicates data sets with a sufficient quality [15]. But that can only be part of an overall data quality strategy.

And of course, since you buy the complete set of different tools that is included in fabric, you’ll pay for a lot of features, that you’ll probably never use. Especially, since the tool stack is constantly supplemented [16].

Conclusion

Finally, I came to the conclusion that Microsoft Fabric is a great thing for everybody who is already at home in the Microsoft Universe with Power BI and Azure. It pulls down the walls between Data Analysis and Data Engineering and may also promote the collaboration of these teams.

As Fabric offers several methods to work with the data, it enables all levels of data professionals to contribute to the data landscape of the company. Everybody can work with the data using the method they are familiar with or that fits their level of tech affinity.

The small packages available in Fabric may also simplify the entry of smaller businesses into modern Business Intelligence methods without having to deal with several applications and providers.

Even if specialized solutions may always perform better, are more flexible and faster developing, with Fabric Microsoft offers a low entry point to modern business intelligence ideas.

If you want to find out, if your business needs Microsoft Fabric and how to implement it into your data landscape, do not hesitate to contact us.

[1] What is Microsoft Fabric — Microsoft Fabric | Microsoft Learn

[2] https://powerbi.microsoft.com/en-in/blog/microsoft-named-a-leader-in-the-2023-gartner-magic-quadrant-for-analytics-and-bi-platforms/

[3] https://azure.microsoft.com/de-de/products/

[4] https://learn.microsoft.com/en-us/fabric/get-started/microsoft-fabric-overview

[5] https://learn.microsoft.com/en-us/fabric/cicd/git-integration/intro-to-git-integration

[6] https://learn.microsoft.com/en-us/fabric/onelake/onelake-overview

[7] https://blog.fabric.microsoft.com/en-US/blog/data-warehouse-sharing/

[8] https://learn.microsoft.com/en-us/fabric/onelake/onelake-overview

[9] https://support.fabric.microsoft.com/de-at/blog/introducing-external-data-sharing-a-new-way-to-collaborate-across-fabric-tenants?ft=All

[10] https://learn.microsoft.com/en-us/fabric/data-engineering/load-data-lakehouse

[11] Semantic model modes in the Power BI service — Power BI | Microsoft Learn