What Is Serialization?

By - Geek Logbook
Posted on 2025-04-30
Posted in Notes

In the world of data engineering and software systems, serialization is a fundamental concept that allows you to efficiently store, transmit, and reconstruct data structures. If you’ve worked with formats like Parquet, Avro, JSON, or CSV, you’ve already interacted with serialization—whether you knew it or not.

In this post, we’ll explore:

What serialization means
The difference between binary and text-based formats
Examples with Parquet and Avro
Key papers and standards behind the concept

What Is Serialization?

Serialization is the process of converting in-memory data structures (like dictionaries, objects, or DataFrames) into a format that can be:

Written to disk
Sent across a network
Saved for later use

The inverse process is called deserialization, where you reconstruct the original structure from the serialized form.

Binary formats like Parquet and Avro:

Are compact
Support compression
Require serialization and deserialization
Are best for large-scale, distributed data systems

Text formats like CSV and JSON:

Are human-readable
Easy to debug
Simpler but less efficient for large data

Where Does Serialization Come From?

While serialization is a broad topic, some foundational works and standards include:

“A Note on Distributed Computing” – Waldo et al., 1994
Protocol Buffers Paper (Google) – 2008
Apache Avro Design
Thrift: Cross-Language Serialization – Facebook
RFC 4506: External Data Representation (XDR) – 1987
ASN.1 – Telecom serialization standard

Serialization is at the heart of:

Remote Procedure Calls (RPCs)
Kafka messaging
Data lakes and warehouses
Microservices and APIs

Conclusion

Serialization may sound technical, but it’s everywhere: from saving files on your computer to streaming massive datasets across cloud platforms. Understanding when to use binary formats like Parquet or Avro vs text formats like CSV and JSON can make your data pipelines more efficient and robust.

Tags:Serialization

Geek Logbook

Recent Posts

Categories

Archives

What Is Serialization?

What Is Serialization?

Binary formats like Parquet and Avro:

Text formats like CSV and JSON:

Where Does Serialization Come From?

Conclusion

Previous Article

Next Article