Filtering Data in Azure Data Factory: Keeping Only “FileWrite” Operations
In this post, I’ll walk through how to filter rows in Azure Data Factory (ADF) using the Filter activity to retain only the rows where a specific column (OperationName) has the value “FileWrite”. This is especially useful when you’re processing large datasets and only need certain entries based on a condition.
Problem
You have a dataset in ADF with a column called OperationName, and you want to filter out all rows where the OperationName does not equal "FileWrite".
Solution
Azure Data Factory provides the Filter activity, which allows you to filter items in a collection (like a dataset) based on a condition. In this case, we’ll use it to filter rows that only have "FileWrite" in the OperationName column.
Step-by-Step Guide
1. Input Data
First, ensure you have an input dataset. This can be the output of a previous activity in your pipeline. In this example, we’ll assume the activity that retrieves this data is named read_for_iteration.
2. Add the Filter Activity
In your ADF pipeline:
- Drag a Filter activity into your pipeline.
- Configure it to depend on the output of the activity that retrieves your data (
read_for_iterationin this case).
3. Configure the Filter Activity
In the Filter activity, we need to configure the following properties:
- Items: This is the collection we are filtering. In this case, it’s the output from the
read_for_iterationactivity. - Condition: This is the logic that will filter the items. We’ll filter based on the
OperationNamecolumn, and only keep rows whereOperationNameequals"FileWrite".
Here’s the JSON configuration for the Filter activity:
{
"name": "Only FileWrite",
"type": "Filter",
"dependsOn": [
{
"activity": "read_for_iteration",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "@activity('read_for_iteration').output.value",
"type": "Expression"
},
"condition": {
"value": "@equals(item().OperationName, 'FileWrite')",
"type": "Expression"
}
}
}
Explanation:
@activity('read_for_iteration').output.value: This expression retrieves the data from the output of theread_for_iterationactivity.@equals(item().OperationName, 'FileWrite'): This is the filter condition. Theitem()function refers to each row in the dataset, anditem().OperationNameaccesses the value in theOperationNamecolumn. The@equals()function checks if that value equals"FileWrite".
4. Testing
Once the pipeline is set up and you’ve configured the filter, you can run the pipeline. The output of the Filter activity will only include rows where OperationName equals "FileWrite".
Conclusion
The Filter activity in Azure Data Factory provides a simple yet powerful way to filter your data based on conditions. In this example, we used it to filter out all rows that don’t have OperationName set to "FileWrite". This method can be adapted to filter data based on different columns or conditions, providing a flexible way to handle data in ADF pipelines.