Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Be one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now

KevinChant

Initial tests of the Data Factory Testing Framework

In this post I want to cover my initial tests of the Data Factory Testing Framework. Which is a unit testing framework you can use to test Microsoft Fabric Data Pipelines.

 

I wanted to cover this framework since I mentioned it in a previous post about unit tests on Microsoft Fabric items.

Since publishing that post I have had a lot of questions about this particular framework. Including some questions at the "ask the expert" booth during the European Microsoft Fabric Community Conference.

 

By the end of this post, you will know the results of my initial tests of the Data Factory Testing Framework. Along the way I share plenty of links.

 

One key point to remember is that the Data Pipeline Testing Framework is currently in Public Preview. Which means that it is not officially supported by Microsoft. Plus, the contents of this post are subject to change.

 

Quick recap about the Data Factory Testing Framework

Just to recap, the Data Factory Testing Framework is a stand-alone framework which allows you to write unit tests for Microsoft Fabric Data Pipelines, Azure Data Factory and Azure Synapse Analytics.

 

It comes with a well-documented README file. However, the short version of a description that is relevant for this post is that you can do this with Microsoft Fabric Data Pipelines that are created in workspaces configured with Microsoft Fabric Git integration.

 

You can then clone the Git repository that contains the Microsoft Fabric Data Pipelines metadata in in order to perform unit tests. 

 

Which you can do by installing the Data Pipeline Testing Framework along with other required components (such as Pytest) on the same machine where the clone of the repository is stored. This can be various types of machines including an Azure Pipelines Agent or your local machine.

 

For the benefit of this post, I performed my initial tests on a local machine.

 

Creating sample pipeline

For my initial tests I first created a new workspace and added a sample Microsoft Fabric Data Pipeline. Which copies data from an online source into a Lakehouse. In addition, I added a parameter to the pipeline. In order to perform asserts better during unit testing. As you can see below.

Adding a parameter to the sample Data PipelineAdding a parameter to the sample Data Pipeline

I then configured the workspace to use a Microsoft Fabric Git integration. Which connected my workspace to a Git repository in Azure DevOps.

Git repository in Azure DevOpsGit repository in Azure DevOps

I then cloned this repository locally with Visual Studio Code. Afterwards, I made sure I installed any outstanding items in the "Getting started" guide for the framework locally before testing.

 

Initial tests of the Data Factory Testing Framework

To start my tests I added a new Python script file called "simple_pipeline_tests.py" to the root of the cloned repository in Visual Studio Code. I then added the below code to the file.

 

import pytest
from data_factory_testing_framework import TestFramework, TestFrameworkType
from data_factory_testing_framework.models import Pipeline
from data_factory_testing_framework.state import PipelineRunState,RunParameter, RunParameterType


@pytest.fixture
def test_framework(request: pytest.FixtureRequest) -> TestFramework:
    return TestFramework(
        framework_type=TestFrameworkType.Fabric,
        root_folder_path=request.fspath.dirname,
    )

@pytest.fixture
def pipeline(test_framework: TestFramework) -> Pipeline:
    return test_framework.get_pipeline_by_name("Test")

def test_directory_parameter(request: pytest.FixtureRequest, pipeline: Pipeline) -> None:
    # Arrange
    activity = pipeline.get_activity_by_name("Copy sample data")
    state = PipelineRunState(
    parameters=[
        RunParameter(RunParameterType.Pipeline, name="DirectoryName", value="SampleData")
    ],
    )

    # Act
    activity.evaluate(state)

    # Assert to check correct directory name is used
    assert (
        activity.type_properties["sink"]["datasetSettings"]["typeProperties"]["location"]["folderPath"].result
        == "SampleData"
    )

 

At the start of the script, the pytest module and essential attributes from the data_factory_testing_framework module are imported. It then runs code to identify the relevant pipeline folder.

 

After the initial configuration the script creates a test_directory_parameter function. Which performs the following tasks:

 

  1. First, it performs an "Arrange". By identifying the pipeline activity and then setting the stage by adding a hypothetical value for the pipeline parameter that I created.
  2. It then performs an "Act". By evaluating the hypothetical state of the pipeline based on the values entered.
  3. Finally, it performs an "Assert". By performing an "assertion test" to verify that the directory folder is the same as the one I specified in the Act.

I then made sure that the integrated terminal was displayed in Visual Studio Code. Afterwards, I ran the below command in the integrated terminal to perform my unit test:

 

pytest simple_pipeline_tests.py

 

Which completed showing a passed test. As you can see below.

Visual Studio Code showing a passed testVisual Studio Code showing a passed test

As with any good testing strategy I then purposely tested for failure. By changing the Directory name in the parameter. When I ran the test again it failed.Visual Studio Code showing a failed testVisual Studio Code showing a failed test

One thing I really like about working with this framework is that the code is highly scalable. For example, I can add a second function to my script to check that the configured destination is a Lakehouse.

 

def test_location_type(request: pytest.FixtureRequest, pipeline: Pipeline) -> None:
    # Arrange
    activity = pipeline.get_activity_by_name("Copy sample data")
    state = PipelineRunState(
    parameters=[
        RunParameter(RunParameterType.Pipeline, name="DirectoryName", value="SampleData")
    ],
    )

    # Act
    activity.evaluate(state)

    # Assert to check final location is a Lakehouse
    assert (
    activity.type_properties["sink"]["datasetSettings"]["typeProperties"]["location"]["type"]
    == "LakehouseLocation"
    )

 

In reality, this is just the tip of the iceberg. Due to the variety of pipeline components that you can test and the scalability of this framework.

 

Final words

I hope that my initial tests of the Data Factory Testing Framework inspires some of you to look into this framework further. Because I get asked about unit testing often and this solution has many possibilities.

 

Of course, if you have any comments or queries about this post feel free to reach out to me.