September 2025 – Page 2

By - Geek Logbook
Posted on 2025-09-16
Posted in Data

Managing Evolving Schemas in Apache Spark: A Strategic Approach

Schema management is one of the most overlooked yet critical aspects of building reliable data pipelines. In a fast-moving environment, schemas rarely remain static: new fields are added, data types evolve, and nested structures become more complex. Relying on hard-coded schemas within Spark jobs may seem convenient at first, but it quickly turns into a

By - Geek Logbook
Posted on 2025-09-16
Posted in Cloud

Secure Ways to Share Private Data on AWS: Beyond Public Buckets

When building data platforms in the cloud, it is common to share data with partners, clients, or internal teams outside your own. AWS provides several mechanisms to grant secure, granular access — far beyond the simple (and risky) “make the bucket public” approach. In this post, we will explore the main strategies for sharing data

By - Geek Logbook
Posted on 2025-09-16
Posted in Others

Fixing Cursor Login Issues on Linux (AppImage)

When running Cursor on Linux, especially with the AppImage version, you might encounter a situation where you can’t log in. This usually happens because Cursor stores its session state locally, and sometimes that state gets corrupted. In this article, we’ll walk through how to diagnose the issue and reset your session state without losing your

By - Geek Logbook
Posted on 2025-09-15
Posted in Programming

Querying JSONB in PostgreSQL Efficiently

In modern applications, it is common to store semi-structured data in JSON format inside a relational database like PostgreSQL. However, to analyze this data properly, you need a way to transform it into a tabular structure that can be queried with standard SQL. In this article, we will demonstrate a real-world example of reading a

By - Geek Logbook
Posted on 2025-09-15
Posted in Architectures

Designing a Semantic Layer for Athena + Power BI

Modern data architectures benefit from a clear separation of layers: Ingesta, Staging, and Semantic (Presentation). When using Amazon Athena as the query engine and Power BI as the visualization tool, this layered approach enables scalability, governance, and cost control. 1. Ingesta (Raw Layer) Purpose: Store data exactly as it arrives from source systems, preserving fidelity.

By - Geek Logbook
Posted on 2025-09-15
Posted in Data

Understanding Window Functions in SQL: Beyond Simple Aggregations

When we think about SQL functions, we often start with scalar functions (UPPER(), ROUND(), NOW()) or aggregate functions (SUM(), AVG(), COUNT()). But there is a third type that is essential for advanced analytics: window functions. The “Window”: The Metaphor Behind the Concept A window function is evaluated for every row, but not in isolation —

By - Geek Logbook
Posted on 2025-09-072025-09-15
Posted in Cloud

How to Set CloudWatch Log Retention Policies with Terraform

AWS CloudWatch is a powerful service for monitoring applications and infrastructure. However, by default, CloudWatch Logs are configured to never expire. This can lead to excessive storage costs and retention of data that you may not need. A better approach is to define a retention policy that aligns with your operational and compliance requirements. In

By - Geek Logbook
Posted on 2025-09-072025-09-15
Posted in Projects

Automating Data Extraction with Airflow, BeautifulSoup, and MinIO

In the data engineering ecosystem, a common task is to automate the extraction of data from external sources, perform minimal processing, and store it in a data lake for further analysis. In this post, I will demonstrate how to build an Apache Airflow DAG that fetches public information from Whale Alert, transforms it into a

By - Geek Logbook
Posted on 2025-09-06
Posted in Cloud

Orchestrating Multiple AWS Glue Workflows with Step Functions

In modern data architectures, it is common to manage multiple ETL pipelines that must run in sequence or in parallel. AWS Glue provides a robust framework for building workflows, but when we need to orchestrate two or more Glue Workflows together, AWS Step Functions becomes the natural choice. In this post, we will explain how

By - Geek Logbook
Posted on 2025-09-062025-09-06
Posted in Programming

Understanding the Strategy Design Pattern

In the landscape of software design, maintaining flexibility and scalability is crucial. One of the most effective ways to achieve these qualities is by leveraging design patterns. Among the behavioral design patterns, the Strategy Pattern stands out as a powerful tool to manage algorithms dynamically. What is the Strategy Pattern? The Strategy Pattern allows you

Geek Logbook

Recent Posts

Categories

Archives

Month: September 2025

Managing Evolving Schemas in Apache Spark: A Strategic Approach

Secure Ways to Share Private Data on AWS: Beyond Public Buckets

Fixing Cursor Login Issues on Linux (AppImage)

Querying JSONB in PostgreSQL Efficiently

Designing a Semantic Layer for Athena + Power BI

Understanding Window Functions in SQL: Beyond Simple Aggregations

How to Set CloudWatch Log Retention Policies with Terraform

Automating Data Extraction with Airflow, BeautifulSoup, and MinIO

Orchestrating Multiple AWS Glue Workflows with Step Functions

Understanding the Strategy Design Pattern