Geek Logbook

Tech sea log book

Grouping Data in PySpark with Aliases for Aggregated Columns

When working with large datasets in PySpark, grouping data and applying aggregations is a common task. In this post, we’ll explore how to group data by a specific column and use aliases for the resulting aggregated columns to improve readability and clarity. Problem Statement Consider the following sample dataset: IdCompra Fecha IdProducto Cantidad Precio IdProveedor

Handling Offset-Naive and Offset-Aware Datetimes in Python

When working with datetime objects in Python, you may encounter the error: This error occurs when comparing two datetime objects where one contains timezone information (offset-aware) and the other does not (offset-naive). To resolve this, you must ensure both datetime objects are either offset-aware or offset-naive before making the comparison. Making a Datetime Offset-Aware in

Troubleshooting Import Errors in Python: A Case Study

Python’s modular design allows developers to break their code into smaller, reusable components. However, import errors can often disrupt the flow, especially in complex projects. In this post, we’ll discuss a real-world example of resolving an import error while working on a Python project. The Scenario The project’s directory structure is as follows: The file

Parsing Complex Data from HTML Tables with Python

When working with web scraping, you often encounter scenarios where HTML content is nested or contains encoded data within JavaScript attributes. This post walks through parsing player statistics from a complex HTML table, utilizing Python and the BeautifulSoup library to streamline the extraction of JSON data hidden in JavaScript functions. Project Overview We have an

Comparative Investment Analysis of Invesco and Blackstone Using Python

Introduction In this post, we’ll explore how to use Python programming to compare the performance of two investment firms, Invesco and Blackstone. Invesco is known for its focus on public asset management, while Blackstone specializes in private equity, actively acquiring and managing companies. We’ll examine some key performance and risk metrics to understand how these

Built-in Functions vs. Object-Oriented Methods

Python strives to be simple and clear, so some operations are implemented as built-in functions, while others are object-specific methods. This distinction arises from the way Python handles different types of objects. Built-in Functions len() is a built-in function that works with many different types, including strings, lists, tuples, dictionaries, and more. This allows for

Downloading Data from the SEC Website using Python

In this blog post, I’ll show you how to download a JSON file from the U.S. Securities and Exchange Commission (SEC) website using Python. The file contains company tickers, which can be useful for various financial analyses and applications. Steps to Download the File Here’s a complete Python script that handles the download: Run the