January 2025 – Geek Logbook

By - Geek Logbook
Posted on 2025-01-30
Posted in Programming

Ranking Products Using Window Functions in PySpark

Introduction Window functions are powerful tools in SQL and PySpark that allow us to perform calculations across a subset of rows related to the current row. In this blog post, we’ll explore how to use window functions in PySpark to rank products based on their sales and filter those with sales above the category average.

By - Geek Logbook
Posted on 2025-01-27
Posted in Data

Handling Null Values in Data: Algorithms and Strategies

Null values are a common challenge in data analysis and machine learning. Dealing with them effectively is essential to ensure the reliability of your insights and models. In this post, we’ll explore various strategies and algorithms to handle null values, ranging from simple techniques to advanced methods. 1. Removing Null Values This is the simplest

By - Geek Logbook
Posted on 2025-01-18
Posted in Notes

What Does an Exploratory Data Analysis (EDA) Evaluate?

An Exploratory Data Analysis (EDA) is a critical step in the data analysis process that focuses on evaluating and examining data to uncover its main characteristics. It is performed before delving deeper into analysis or building predictive models. The primary purpose of an EDA is to understand the dataset, identify issues, and gain insights that

By - Geek Logbook
Posted on 2025-01-18
Posted in Notes

Exploring Free Resources to Learn AWS and Azure Cloud Platforms

Cloud computing is an essential skill in today’s tech landscape. Among the major players, AWS and Azure stand out as leading cloud platforms, offering a wealth of free resources to help individuals learn and experiment. This blog post outlines some of the most valuable free tools, learning paths, and tips for getting started with AWS

By - Geek Logbook
Posted on 2025-01-14
Posted in Data

Adding Custom Columns to Your Date Table in Power BI

Introduction A Date Table is an integral part of building robust and insightful Power BI reports. While a basic Date Table allows for time-based filtering and analysis, custom columns can add even more depth and flexibility. This blog post will guide you through adding custom columns to your Date Table using DAX. 1. Why Add

By - Geek Logbook
Posted on 2025-01-13
Posted in Programming

Grouping Data in PySpark with Aliases for Aggregated Columns

When working with large datasets in PySpark, grouping data and applying aggregations is a common task. In this post, we’ll explore how to group data by a specific column and use aliases for the resulting aggregated columns to improve readability and clarity. Problem Statement Consider the following sample dataset: IdCompra Fecha IdProducto Cantidad Precio IdProveedor

By - Geek Logbook
Posted on 2025-01-12
Posted in Programming

Handling Offset-Naive and Offset-Aware Datetimes in Python

When working with datetime objects in Python, you may encounter the error: This error occurs when comparing two datetime objects where one contains timezone information (offset-aware) and the other does not (offset-naive). To resolve this, you must ensure both datetime objects are either offset-aware or offset-naive before making the comparison. Making a Datetime Offset-Aware in

By - Geek Logbook
Posted on 2025-01-07
Posted in Programming

Extracting Dynamic Content from an iFrame with Selenium in Python

Accessing content inside an iFrame can be tricky, especially when the content is loaded dynamically. In this blog post, we’ll walk through an example of how to navigate an iFrame, click on an interactive tab, and save the loaded content to a file using Selenium in Python. This example is particularly useful when dealing with

By - Geek Logbook
Posted on 2025-01-06
Posted in Programming

Automating SQL Script Execution with Cron

In this blog post, we’ll explore how to automate the execution of SQL scripts using cron, a powerful scheduling tool available on Unix-based systems. This approach is ideal for database administrators and developers who need to run SQL scripts at specific intervals without manual intervention. Overview Cron jobs allow you to schedule tasks to run

By - Geek Logbook
Posted on 2025-01-03
Posted in Data

Counting Word Frequency in a SQL Column

Sometimes, you may need to analyze text data stored in a database, such as counting the frequency of words in a text column. This blog post demonstrates how to achieve this in SQL using a practical example. Problem Overview Let’s assume you have a table named feedback with a column comentarios that contains text data.

Geek Logbook

Recent Posts

Categories

Archives

Month: January 2025

Ranking Products Using Window Functions in PySpark

Handling Null Values in Data: Algorithms and Strategies

What Does an Exploratory Data Analysis (EDA) Evaluate?

Exploring Free Resources to Learn AWS and Azure Cloud Platforms

Adding Custom Columns to Your Date Table in Power BI

Grouping Data in PySpark with Aliases for Aggregated Columns

Handling Offset-Naive and Offset-Aware Datetimes in Python

Extracting Dynamic Content from an iFrame with Selenium in Python

Automating SQL Script Execution with Cron

Counting Word Frequency in a SQL Column