Geek Logbook

By - Geek Logbook
Posted on 2025-07-06
Posted in Notes

How Hadoop Made Specialized Storage Hardware Obsolete

In the early 2000s, enterprise data processing was dominated by high-end hardware. Organizations relied heavily on centralized storage systems such as SAN (Storage Area Networks) and NAS (Network Attached Storage), typically connected to symmetric multiprocessing (SMP) servers or high-performance computing (HPC) clusters. These environments were expensive to scale, difficult to manage, and designed to avoid

By - Geek Logbook
Posted on 2025-07-06
Posted in Notes

EMR vs AWS Glue: Choosing the Right Data Processing Tool on AWS

When working with big data on AWS, two commonly used services for data processing are Amazon EMR and AWS Glue. Although both support scalable data transformation and analytics, they differ significantly in architecture, control, use cases, and cost models. Choosing the right tool depends on your specific workload, performance needs, and operational preferences. In this

By - Geek Logbook
Posted on 2025-07-05
Posted in Programming

Why You Should Use the -out Option with terraform plan

When working with Terraform, a common workflow involves running terraform plan followed by terraform apply. However, you may have come across the following warning: “You didn’t use the -out option to save this plan, so Terraform can’t guarantee to take exactly these actions if you run ‘terraform apply’ now.” This message is more than a

By - Geek Logbook
Posted on 2025-07-05
Posted in Notes

When Should You Use Iceberg with Athena? Partitioning Strategies and Best Practices

As data lakes grow in size and complexity, tools like Amazon Athena combined with table formats like Apache Iceberg become essential for scalability, data governance, and performance. In this post, we’ll explore: Athena + S3: How far does the classic approach go? The typical pattern when querying data in S3 using Athena is: This approach

By - Geek Logbook
Posted on 2025-07-042025-07-04
Posted in Notes

How Google Changed Big Data: The Story of GFS, MapReduce, and Bigtable

In the early 2000s, Google faced a unique challenge: how to store, process, and query massive amounts of data across thousands of unreliable machines. The traditional systems of the time—designed for a world of smaller datasets and centralized infrastructure—simply couldn’t keep up. Google responded by designing an entirely new architecture. It wasn’t just about solving

By - Geek Logbook
Posted on 2025-07-02
Posted in Notes

ecure Database Access in AWS Using SSH Tunneling

Accessing databases located in private subnets within AWS Virtual Private Clouds (VPCs) is a common requirement in enterprise architectures. To ensure secure connectivity without exposing the database to the public internet, developers and operations engineers often employ SSH tunneling via a bastion host. Background Databases in a private subnet cannot be accessed directly from external

By - Geek Logbook
Posted on 2025-07-01
Posted in Programming

Mastering the Linux find Command: A Practical Introduction

When working with Linux, one of the most powerful tools at your disposal is the find command. Whether you’re managing a personal machine or maintaining a production server, being able to locate files quickly and efficiently is an essential skill. What is the find Command? The find command allows you to search for files and

By - Geek Logbook
Posted on 2025-07-01
Posted in Notes

Did Early Personal Computers Really Have a CPU? A Look at the von Neumann Architecture

When we think of a personal computer (PC), we typically imagine a processor, memory, a keyboard, and a display. But a deeper question often goes unasked: Did all early personal computers actually include a full CPU with an ALU, and were they truly based on the von Neumann architecture? The short answer is: yes —

By - Geek Logbook
Posted on 2025-06-30
Posted in Data

The Origin and Evolution of the DataFrame

When working with data today—whether in Python, R, or distributed computing platforms like Spark—one of the most commonly used structures is the DataFrame. But where did it come from? This post explores the origin, evolution, and growing importance of the DataFrame in data science and analytics. What is a DataFrame? A DataFrame is a two-dimensional

By - Geek Logbook
Posted on 2025-06-30
Posted in Programming

Understanding ORM: Bridging the Gap Between Objects and Relational Databases

In modern software development, working with databases is a fundamental requirement. Most applications need to persist, retrieve, and manipulate data stored in relational databases such as PostgreSQL, MySQL, or SQLite. Traditionally, this interaction has been done through SQL queries. However, Object-Relational Mapping (ORM) has emerged as a powerful alternative that simplifies and streamlines this process.

Recent Posts

Categories

Archives

How Hadoop Made Specialized Storage Hardware Obsolete

EMR vs AWS Glue: Choosing the Right Data Processing Tool on AWS

Why You Should Use the -out Option with terraform plan

When Should You Use Iceberg with Athena? Partitioning Strategies and Best Practices

How Google Changed Big Data: The Story of GFS, MapReduce, and Bigtable

ecure Database Access in AWS Using SSH Tunneling

Mastering the Linux find Command: A Practical Introduction

Did Early Personal Computers Really Have a CPU? A Look at the von Neumann Architecture

The Origin and Evolution of the DataFrame

Understanding ORM: Bridging the Gap Between Objects and Relational Databases