Geek Logbook

Tech sea log book

EMR vs AWS Glue: Choosing the Right Data Processing Tool on AWS

When working with big data on AWS, two commonly used services for data processing are Amazon EMR and AWS Glue. Although both support scalable data transformation and analytics, they differ significantly in architecture, control, use cases, and cost models. Choosing the right tool depends on your specific workload, performance needs, and operational preferences. In this

Why You Should Use the -out Option with terraform plan

When working with Terraform, a common workflow involves running terraform plan followed by terraform apply. However, you may have come across the following warning: “You didn’t use the -out option to save this plan, so Terraform can’t guarantee to take exactly these actions if you run ‘terraform apply’ now.” This message is more than a

How Google Changed Big Data: The Story of GFS, MapReduce, and Bigtable

In the early 2000s, Google faced a unique challenge: how to store, process, and query massive amounts of data across thousands of unreliable machines. The traditional systems of the time—designed for a world of smaller datasets and centralized infrastructure—simply couldn’t keep up. Google responded by designing an entirely new architecture. It wasn’t just about solving

ecure Database Access in AWS Using SSH Tunneling

Accessing databases located in private subnets within AWS Virtual Private Clouds (VPCs) is a common requirement in enterprise architectures. To ensure secure connectivity without exposing the database to the public internet, developers and operations engineers often employ SSH tunneling via a bastion host. Background Databases in a private subnet cannot be accessed directly from external