Geek Logbook

Tech sea log book

Understanding Slowly Changing Dimensions (SCD) in Data Warehousing

When dealing with data warehouses, handling changes in dimension data over time is crucial. Unlike operational databases where updates are straightforward, data warehouses require preserving historical data for accurate analysis. This is where Slowly Changing Dimensions (SCDs) come into play.

What are Slowly Changing Dimensions (SCD)?

SCDs are a technique in data warehousing used to manage how dimension data changes over time. Different strategies exist depending on whether historical data should be retained or replaced.

Types of Slowly Changing Dimensions

SCD Type 0 – Retaining Original Values

  • The original value never changes.
  • Suitable for data that should remain static, such as a customer’s date of birth.

SCD Type 1 – Overwriting the Existing Data

  • The new value replaces the old value.
  • No historical data is retained.
  • Example: Updating a customer’s phone number directly.

SCD Type 2 – Creating a New Record for Each Change

  • A new record is added with a unique surrogate key and validity dates.
  • Allows tracking historical changes.
  • Example:
customer_idnameaddressstart_dateend_dateis_current
1JohnNY Street 12023-01-012023-06-300
2JohnLA Street 52023-07-01NULL1

When John moves to a new address, a new record is created, marking the previous one as inactive.

SCD Type 3 – Adding a Column for the Previous Value

  • Stores a limited history by keeping only the last change.
  • Example: A previous_region column records the last region before an update.

SCD Type 4 – Using a Separate History Table

  • The current table maintains only the latest record.
  • A history table stores all previous changes.
  • Useful when quick access to the latest data is needed while retaining history.

SCD Type 6 – Hybrid Approach (Combining Type 1, 2, and 3)

  • Maintains full history while allowing easy access to the current state.
  • Combines new records (Type 2), updates (Type 1), and a previous value column (Type 3).

When to Use SCDs?

SCDs are essential when:

  • Tracking historical changes in customer, product, or employee data.
  • Analyzing trends and patterns over time.
  • Ensuring compliance with auditing and reporting requirements.

Implementing SCDs in SQL and ETL Tools

Popular ETL tools like Apache Airflow, dbt, and Talend provide automated methods to implement SCDs. In SQL, SCD Type 2 can be implemented using INSERT and UPDATE statements with effective date tracking.