Apache Iceberg – Geek Logbook

By - Geek Logbook
Posted on 2025-09-03
Posted in Data

Choosing Between saveAsTable and Iceberg’s writeTo in AWS Glue and Athena

When working with Spark on AWS Glue, there are multiple ways to persist DataFrames as tables and make them queryable in Amazon Athena. Two common approaches are: At first glance they may look similar, but they solve different problems and have distinct implications for scalability, schema evolution, and data management. 1. Writing with saveAsTable A

By - Geek Logbook
Posted on 2025-07-24
Posted in Data

Optimizing Partition Strategies in Apache Iceberg on AWS

When working with large-scale analytical datasets, efficient partitioning is critical for achieving optimal query performance and cost savings. Apache Iceberg, a modern table format designed for big data, offers powerful partitioning capabilities. One common design decision is whether to use a single date column (e.g., yyyymmdd) or separate columns for year, month, and day (year,

By - Geek Logbook
Posted on 2025-07-22
Posted in Data

How Transactions Work in Databricks Using Delta Lake

Databricks is a powerful platform for big data analytics and machine learning. One of its key features is the ability to run transactional workloads over large-scale data lakes using Delta Lake. This post explores how transactions are supported in Databricks and how you can use them to ensure data consistency and integrity. What Are Transactions

By - Geek Logbook
Posted on 2025-06-18
Posted in Data

When Should You Use Parquet and When Should You Use Iceberg?

In modern data architectures, selecting the right storage and management solution is essential for building efficient, reliable, and scalable pipelines. Two popular choices that often come up are Parquet and Apache Iceberg. While they can work together, they serve different purposes and solve different problems. This article explains what each one is, when to use

By - Geek Logbook
Posted on 2025-06-17
Posted in Programming

How to Fix ‘DataFrame’ object has no attribute ‘writeTo’ When Working with Apache Iceberg in PySpark

If you’re working with Apache Iceberg in PySpark and encounter this error: You’re not alone. This is a common mistake when transitioning from the traditional DataFrame.write syntax to Iceberg’s DataFrameWriterV2 API. Let’s walk through why this happens, how to fix it quickly, and when to use each writing method. Why This Error Happens The method

Geek Logbook

Recent Posts

Categories

Archives

Tag: Apache Iceberg

Choosing Between saveAsTable and Iceberg’s writeTo in AWS Glue and Athena

Optimizing Partition Strategies in Apache Iceberg on AWS

How Transactions Work in Databricks Using Delta Lake

When Should You Use Parquet and When Should You Use Iceberg?

How to Fix ‘DataFrame’ object has no attribute ‘writeTo’ When Working with Apache Iceberg in PySpark