Geek Logbook

Tech sea log book

Filtering Items in Azure Data Factory: Excluding Items That Begin with an Underscore

Azure Data Factory (ADF) is a powerful tool for building ETL (Extract, Transform, Load) workflows in the cloud. One common requirement is to filter data or files based on certain conditions. In this post, we’ll explore how to use the Filter activity in ADF to exclude items that begin with an underscore (“_”), which is useful when you want to skip temporary or system files.

Use Case

Imagine you are working with a list of files or data records, and you want to process only those that do not start with an underscore. Files starting with an underscore might be temporary files or files meant to be ignored in your processing. The Filter activity in Azure Data Factory allows you to achieve this efficiently.

Steps to Filter Items That Do Not Start with “_”

  1. Create a Pipeline: Start by creating a new pipeline in Azure Data Factory. You can do this by navigating to the ADF interface and selecting the option to create a new pipeline.
  2. Add a Filter Activity:
    • Drag and drop the Filter activity from the activities pane onto your pipeline canvas.
  3. Configure the Filter Activity:
    • Click on the Filter activity to configure its settings.
    • In the Items property, specify the dataset or array you want to filter. This might come from a previous activity like a Lookup or Get Metadata activity. For example, you might reference the output of a Get Metadata activity that lists files:
@activity('GetMetadataActivityName').output.childItems

Set the Filter Condition:

  • Go to the Settings tab of the Filter activity.
  • In the Condition section, enter the following expression to filter out items that start with an underscore:
@not(startswith(item().name, '_'))

Connect the Filtered Output:

  • After the Filter activity, you can connect the output to other activities that will process the filtered items. You can access the filtered results using:
@activity('FilterActivityName').output

Example Scenario

Let’s say you have a Get Metadata activity that retrieves a list of files from a storage account. You want to process only the files that do not start with an underscore. By using the Filter activity with the condition specified above, you can ensure that only the desired files are passed on for further processing.

Summary

Using the Filter activity in Azure Data Factory with the @not(startswith(...)) function is a straightforward way to exclude items based on a naming convention, such as files that start with an underscore. This approach helps you manage and process data more effectively by allowing you to focus only on the relevant items.

With these steps, you can easily set up a filtering mechanism in your ADF pipelines, ensuring that only the data you need is processed, saving time and resources.

Tags: