15. November 2023 By Markus Klesel and Dr. Stefan Klempnauer
First Adopter Approach in a Client Project with Microsoft Fabric: Insights from the Trenches of Data Warehouse Migration
In the ever-evolving landscape of data management and analytics, staying ahead of the curve is critical to meeting the needs of modern digital businesses. With Microsoft Fabric entering its public preview phase in mid-May of 2023, this ecosystem is experiencing significant disruption. We partnered with our client to test these waters, embarking on a Proof of Concept (PoC) project to migrate their existing on-premises data warehouse to the cloud, using Microsoft Fabric.
We will look at the first steps of the migration project and highlight the challenges and opportunities we faced from an organizational and technical perspective. In doing so, we hope to provide valuable insights for any organization considering a similar path or currently running their own initial PoC with Fabric.
Why Microsoft Fabric?
According to Microsoft’s official documentation:
“Microsoft Fabric is an all-in-one analytics solution for enterprises that covers everything from data movement to data science, Real-Time Analytics, and business intelligence. It offers a comprehensive suite of services, including data lake, data engineering, and data integration, all in one place.”
During the early PoC and research phases of the project, our goal was to develop a solid understanding of the capabilities and strengths of Microsoft’s new product. Fabric holds great promise for any organization looking to unify their data management and analytics toolset into one integrated platform, based on the Power BI workspace experience. It is particularly well-suited for organizations that are heavily dependent on the Microsoft ecosystem. In addition, migrating from an on-premises solution to the cloud offers a number of benefits, including cost-effectiveness, scalability, accessibility, reliability, and the flexibility to operate remotely.
Solving Organizational Challenges
For the project, the introduction of a new technology means above all managing the given uncertainties and integrating all those involved in the existing processes through good communication. We have summarized this in the following two key points.
Dealing with Uncertainty
One of the most significant challenges in planning a migration project with Fabric stems from the inherent uncertainties of its preview phase. As with any product in its early stages, shifting release timelines are a common occurrence. This fluidity can make it difficult to pinpoint specific project milestones and create accurate timelines for completion.
Uncertainty around the cost structure adds another layer of complexity. While we understand the potential broad financial benefits of moving to the cloud, an evolving licensing model and lack of prior project experience can make it difficult to accurately forecast budget needs.
Despite these challenges, we have found a high degree of shareholder engagement is critical to navigating this phase. Strong commitment fosters an understanding that uncertainty is part of the journey when pioneering with a new technology. It allows us to remain flexible and adjust our plans as needed, keeping the goal of a successful migration at the forefront.
A crucial experience was facilitating communication between our data warehouse developers and the Microsoft product team. This bidirectional exchange provided our developers with a deeper understanding of Fabric's functionality and design principles, while offering the Microsoft team a valuable opportunity to understand our practical needs and challenges. This synergy has not only strengthened our current project but has also laid the groundwork for successful collaboration on future initiatives.
Communication and Collaboration in the First Large-Scale Cloud Project
Managing a company's first large-scale cloud project, especially with a new technology like Microsoft Fabric, is an undertaking full of challenges and opportunities. One of the most critical aspects we have identified is the importance of establishing robust communication channels between the various teams involved.
As the data warehouse migration team spearheading one of our first major cloud projects, we found ourselves frequently interacting with the cloud governance and security teams. Navigating this new territory required us to establish a common understanding and develop effective protocols for working together.
As with any major change, there are numerous technical challenges that architects and developers must address. The following are two key points that are relevant to the technical aspect of the migration.
Transitioning from a SQL-based Data Warehouse to a Data Lakehouse Architecture
Moving from a traditional SQL-based data warehouse architecture to a modern data lakehouse architecture within Microsoft Fabric, supported by Fabric lakehouses and Delta Lake, involves several considerations.
One aspect to consider is the potential differences between the native data types in a traditional SQL data warehouse and those in Delta Lake. This may require careful mapping and transformation to maintain data consistency.
In addition, migrating to Microsoft Fabric requires alignment of the existing data warehouse structure with the Fabric workspace landscape. This process can involve restructuring and reorganizing data entities, workflows, and processes. In our case, this gave us the opportunity to rework parts of the grown structure to better align the data warehouse structure more effectively with the concept of data ownership, making the company’s data landscape easier to navigate.
The flexibility of a data lakehouse to handle a variety of data formats beyond structured tables also increases the scope for data exploration. The combination of data warehouse and data lake features in a lakehouse architecture provides a unified, governed source for an organization's data, simplifying access and improving data management.
In summary, despite the many technical adaptations needed to the data transformation framework, we chose a data lakehouse architecture as a modern and future-proof platform for this project.
Embracing the Shift: The Developer's Journey
Moving from SQL-based data transformations to PySpark notebooks in Fabric is a transformative process that presents unique challenges and opportunities, especially from a developer's perspective.
One of the primary hurdles is the programming paradigm shift. Although Fabric warehouses offer the ability to use SQL, our decision to use PySpark n Fabric lakehouses was driven by a desire for greater flexibility. While PySpark is powerful, it introduces a learning curve for developers accustomed to the declarative nature of SQL. PySpark's procedural style requires a new approach to expressing data transformations, and its debugging process can be more complex compared to SQL.
In addition, migrating to a data lakehouse architecture requires developers to adapt to new ways of storing, processing, and retrieving data. They have to deal with concepts such as file-based storage and distributed data processing, which can be quite different from what they are used to in SQL-based systems.
Despite these challenges, the transition presents an array of opportunities. PySpark's distributed computing capabilities allow developers to handle larger volumes of data and more complex transformations than traditional SQL. This increased capacity can enable more sophisticated data analysis.
Furthermore, PySpark notebooks provide developers with an interactive, user-friendly environment that integrates data exploration, visualization, and robust documentation capabilities.
In summary, while moving from a SQL-based data warehouse to a PySpark-based data lakehouse in Fabric can be challenging for developers, it also opens up new ways to tackle large, complex data tasks.
Over the course of the last few months, it has become extremely important not to underestimate the amount of conceptual work that had to go into planning the migration. This is especially true when it comes to changing technology, architecture, and infrastructure. It is important to plan these conceptual shifts carefully to lay the groundwork for a successful migration.
Would you like to learn more about exciting topics from the adesso world? Then take a look at our latest blog posts.