Geek Logbook

Tech sea log book

Filtering Data in Azure Data Factory: Keeping Only “FileWrite” Operations

In this post, I’ll walk through how to filter rows in Azure Data Factory (ADF) using the Filter activity to retain only the rows where a specific column (OperationName) has the value “FileWrite”. This is especially useful when you’re processing large datasets and only need certain entries based on a condition.

Problem

You have a dataset in ADF with a column called OperationName, and you want to filter out all rows where the OperationName does not equal "FileWrite".

Solution

Azure Data Factory provides the Filter activity, which allows you to filter items in a collection (like a dataset) based on a condition. In this case, we’ll use it to filter rows that only have "FileWrite" in the OperationName column.

Step-by-Step Guide

1. Input Data

First, ensure you have an input dataset. This can be the output of a previous activity in your pipeline. In this example, we’ll assume the activity that retrieves this data is named read_for_iteration.

2. Add the Filter Activity

In your ADF pipeline:

  1. Drag a Filter activity into your pipeline.
  2. Configure it to depend on the output of the activity that retrieves your data (read_for_iteration in this case).

3. Configure the Filter Activity

In the Filter activity, we need to configure the following properties:

  • Items: This is the collection we are filtering. In this case, it’s the output from the read_for_iteration activity.
  • Condition: This is the logic that will filter the items. We’ll filter based on the OperationName column, and only keep rows where OperationName equals "FileWrite".

Here’s the JSON configuration for the Filter activity:

{
    "name": "Only FileWrite",
    "type": "Filter",
    "dependsOn": [
        {
            "activity": "read_for_iteration",
            "dependencyConditions": [
                "Succeeded"
            ]
        }
    ],
    "userProperties": [],
    "typeProperties": {
        "items": {
            "value": "@activity('read_for_iteration').output.value",
            "type": "Expression"
        },
        "condition": {
            "value": "@equals(item().OperationName, 'FileWrite')",
            "type": "Expression"
        }
    }
}

Explanation:

  • @activity('read_for_iteration').output.value: This expression retrieves the data from the output of the read_for_iteration activity.
  • @equals(item().OperationName, 'FileWrite'): This is the filter condition. The item() function refers to each row in the dataset, and item().OperationName accesses the value in the OperationName column. The @equals() function checks if that value equals "FileWrite".

4. Testing

Once the pipeline is set up and you’ve configured the filter, you can run the pipeline. The output of the Filter activity will only include rows where OperationName equals "FileWrite".

Conclusion

The Filter activity in Azure Data Factory provides a simple yet powerful way to filter your data based on conditions. In this example, we used it to filter out all rows that don’t have OperationName set to "FileWrite". This method can be adapted to filter data based on different columns or conditions, providing a flexible way to handle data in ADF pipelines.

Tags: