Geek Logbook

Tech sea log book

How Google Changed Big Data: The Story of GFS, MapReduce, and Bigtable

In the early 2000s, Google faced a unique challenge: how to store, process, and query massive amounts of data across thousands of unreliable machines. The traditional systems of the time—designed for a world of smaller datasets and centralized infrastructure—simply couldn’t keep up. Google responded by designing an entirely new architecture. It wasn’t just about solving

ecure Database Access in AWS Using SSH Tunneling

Accessing databases located in private subnets within AWS Virtual Private Clouds (VPCs) is a common requirement in enterprise architectures. To ensure secure connectivity without exposing the database to the public internet, developers and operations engineers often employ SSH tunneling via a bastion host. Background Databases in a private subnet cannot be accessed directly from external

The Origin and Evolution of the DataFrame

When working with data today—whether in Python, R, or distributed computing platforms like Spark—one of the most commonly used structures is the DataFrame. But where did it come from? This post explores the origin, evolution, and growing importance of the DataFrame in data science and analytics. What is a DataFrame? A DataFrame is a two-dimensional

Understanding ORM: Bridging the Gap Between Objects and Relational Databases

In modern software development, working with databases is a fundamental requirement. Most applications need to persist, retrieve, and manipulate data stored in relational databases such as PostgreSQL, MySQL, or SQLite. Traditionally, this interaction has been done through SQL queries. However, Object-Relational Mapping (ORM) has emerged as a powerful alternative that simplifies and streamlines this process.

Understanding findOne and findOneAndUpdate in Mongoose: Key Differences and Practical Usage

When working with MongoDB through Mongoose in Node.js, developers frequently encounter two essential methods: findOne and findOneAndUpdate. Both methods perform document lookups, but they serve distinct purposes and are used in different contexts. In this post, we will break down their core differences, typical use cases, and best practices to optimize your MongoDB queries. The

Are NoSQL Databases Really Schema-less?

A Perspective from the MERN Stack When we first start learning about NoSQL databases, one of the most common things we hear is that they are “schema-less.” At first glance, this seems like a huge advantage: total flexibility, the ability to adapt quickly, and storage that isn’t bound by strict rules. But when we dive

How Network Topology Shapes Distributed Computing and Big Data Systems

When discussing distributed systems and Big Data, people often focus on storage, processing frameworks, and scalability—but one foundational concept underlies it all: network topology. It’s the invisible architecture that dictates how data flows, how quickly systems respond, and how resilient your applications can be. Let’s explore what network topology is, how it evolved, and why