Geek Logbook

Tech sea log book

Understanding Slowly Changing Dimensions (SCD) in Data Warehousing

When dealing with data warehouses, handling changes in dimension data over time is crucial. Unlike operational databases where updates are straightforward, data warehouses require preserving historical data for accurate analysis. This is where Slowly Changing Dimensions (SCDs) come into play. What are Slowly Changing Dimensions (SCD)? SCDs are a technique in data warehousing used to

Modes and Examples of KPIs in Data Analysis Expressions (DAX)

Last Year Comparison When analyzing sales performance, it is often useful to compare the current year’s sales with the same period in the previous year. To do this, we create several calculated measures: DAX expressions to calculate similar measures for Units instead of Revenue: Year-to-Date (YTD) Comparison Year-to-date (YTD) measures track performance from the beginning

Understanding the Relationship Between Database Replication and the CAP Theorem

Introduction Database replication is a fundamental strategy in distributed systems that ensures data is duplicated across multiple nodes. However, when designing a replicated database, one must consider the CAP theorem, which defines the fundamental trade-offs in distributed computing. In this post, we will explore how the CAP theorem applies to database replication and what trade-offs

Understanding Surrogate Keys in Databases

When designing relational databases, one crucial decision is how to uniquely identify each record in a table. This is where surrogate keys come into play. Unlike natural keys, which derive from existing attributes in the data, surrogate keys are system-generated identifiers, typically using numeric sequences or UUIDs. What is a Surrogate Key? A surrogate key

Understanding Pagination vs. Batch Processing in Data Handling

When working with large datasets, developers often face the challenge of efficiently extracting, processing, and managing data. Two commonly used techniques for handling such data efficiently are pagination and batch processing. While both methods aim to optimize memory usage and performance, they serve different purposes and are implemented differently. What is Pagination? Pagination is a

Tracking Daily File Size Changes in SQL

When working with databases that store file metadata, it’s often useful to track how file sizes change over time. If you have a table with the following structure: You may want to analyze the day-to-day changes in file size. This can help in monitoring storage usage, detecting anomalies, or understanding file growth trends. SQL Query

Resolving ‘index.lock’ Issue in Git

When working with Git, you may encounter an error preventing you from switching branches or performing other operations. A common issue is the following: This typically happens when another Git process is running or if a previous operation was interrupted, leaving a stale index.lock file. How to Fix the ‘index.lock’ Error 1. Check for Running

Merging Data in PostgreSQL vs. MySQL: How to Handle Upserts

When working with databases, you often need to update existing records or insert new ones based on whether a match is found. In PostgreSQL, this is efficiently handled using the MERGE statement. However, MySQL does not support MERGE, so alternative approaches must be used. This post explores how to achieve the same functionality in MySQL.

Understanding the Differences Between Parquet, Avro, JSON, and CSV

When working with data, choosing the right file format can significantly impact performance, storage efficiency, and ease of use. In this post, we will compare four widely used data formats: Parquet, Avro, JSON, and CSV. Each has its strengths and weaknesses, making them suitable for different scenarios. 1. Parquet Overview: Parquet is a columnar storage