Geek Logbook

Tech sea log book

Optimizing Partition Strategies in Apache Iceberg on AWS

When working with large-scale analytical datasets, efficient partitioning is critical for achieving optimal query performance and cost savings. Apache Iceberg, a modern table format designed for big data, offers powerful partitioning capabilities. One common design decision is whether to use a single date column (e.g., yyyymmdd) or separate columns for year, month, and day (year,

How Transactions Work in Databricks Using Delta Lake

Databricks is a powerful platform for big data analytics and machine learning. One of its key features is the ability to run transactional workloads over large-scale data lakes using Delta Lake. This post explores how transactions are supported in Databricks and how you can use them to ensure data consistency and integrity. What Are Transactions

When Should You Use Parquet and When Should You Use Iceberg?

In modern data architectures, selecting the right storage and management solution is essential for building efficient, reliable, and scalable pipelines. Two popular choices that often come up are Parquet and Apache Iceberg. While they can work together, they serve different purposes and solve different problems. This article explains what each one is, when to use

How to Fix ‘DataFrame’ object has no attribute ‘writeTo’ When Working with Apache Iceberg in PySpark

If you’re working with Apache Iceberg in PySpark and encounter this error: You’re not alone. This is a common mistake when transitioning from the traditional DataFrame.write syntax to Iceberg’s DataFrameWriterV2 API. Let’s walk through why this happens, how to fix it quickly, and when to use each writing method. Why This Error Happens The method