Geek Logbook

Tech sea log book

Hiding Personal Information in AWS Glue with Spark

Protecting personal data before analytics consumption is a core requirement in modern data platforms. In AWS-based lake architectures, this is typically achieved through data de-identification during ingestion or transformation. This post outlines a practical and production-ready approach to hiding personal information using Spark jobs in AWS Glue. What “Hide Personal Information” Means in Data Engineering

Automating OAuth 2.0 in Postman: storing and refreshing access tokens without copy-paste

Introduction When working with APIs protected by OAuth 2.0, Postman is commonly used for development and testing. A frequent pain point is manual token handling: requesting an access token, copying it, pasting it into headers, and repeating the process every time it expires.This article explains how to fully automate OAuth 2.0 token management in Postman,

Running Scheduled GitHub Actions Locally for Safer Debugging

Overview When working with scheduled automation jobs in GitHub Actions, it is common to face a simple but critical question: Can this workflow be executed locally before pushing to production? The short answer is yes, and in many cases, the local execution is functionally identical to what GitHub Actions performs in the cloud. This article

Designing a Scalable Course Progress Service on AWS

EC2, Lambda, DynamoDB, and RDS Cost and Architecture Trade-offs Context In a multi-platform learning environment where users can advance through courses using both Web and Mobile applications, maintaining a single, consistent view of user progress is critical. In this scenario: This leads to a key architectural decision: introducing a third, independent “source of truth” for

Controlling Branch Deployments and Redirects in Vercel: A Practical Guide

Continuous deployment platforms simplify the release process, but they can easily become noisy when every branch triggers a build. Teams working with multiple development environments often need finer control — building only when specific branches are updated and ignoring the rest. The Problem Imagine a development team maintaining three main branches: By default, Vercel automatically

Estimating the Cost of an AWS Glue Workflow

When working with AWS Glue, one of the most common questions data engineers ask is: How much will this job cost me? If you have a workflow that runs for 13 minutes, understanding the cost model of AWS Glue helps you avoid surprises on your AWS bill. How AWS Glue Pricing Works AWS Glue pricing

AWS EventBridge Rules vs EventBridge Scheduler: Which One Should You Use?

In the AWS ecosystem, there are two main ways to schedule and automate tasks: EventBridge Rules (scheduled rules) and the newer EventBridge Scheduler, which introduces Schedule Groups. While both can trigger actions at defined times, their design, scalability, and flexibility differ significantly. Choosing the right option depends on your workload requirements. 1. What Are EventBridge

Running Production Servers on AWS: EC2 vs RDS Cost Breakdown

When planning to run production workloads in the cloud, cost is one of the most important considerations. In this post, we will explore the monthly expenses of running two application servers and a database server on AWS, and compare two deployment approaches: EC2-only vs EC2 + RDS. Infrastructure Requirements Our baseline infrastructure looks like this: