An Intro to Zero Copy Reads, Serialization

The first time I stumbled upon this Term `Zero Copy Serialization` was when I started looking into Apache Arrow. The website said

The Arrow memory format also supports zero-copy reads for lightning-fast data access without serialization overhead

That’s when I started reading more about this Zero Copy Operations. And why would they beneficial in high performance data systems? This article highlights zero copy IO and compute — typically involved with Serializing and deserializing data.

Often, systems need to serialize data — either from one memory location to the other, on a disk, or between machines in a distributed network.
Serialization involves transforming data structures such as classes, structs , and other primitive types into actual bytes — which can later be stored on a disk or moved over a network.

This is where serialization formats come into picture — the way data is serialized and stored on disk involves some CPU and Memory. One may chose JSON, protobufs, Flatbuffers, or plain Serialization without any format.

For instance, when a JSON API request is parsed to a Java POJO / Go Struct , the data bytes are read in memory, and transformed into native
structs. No matter how small the payload is, the process of reading a JSON string bytes to a native Java Object representation consumes some memory and cpu.

Zero Copy Serialization involves representing data in memory as raw bytes,
and operating on these directly instead of transforming them. It’s like the same bytes can be copied and represented in both disk and memory. Additional Wrappers may be written for facilitating various operations on this data. For instance, when we have

Such a technique can save significantly on CPU and IO when dealing with data intensive applications. As the data size grows, the cost of serialization / transformation can get expensive. Moreover, having data represented the same way both in-memory and on disk is a huge benefit for performing operations without loading the entire data in memory. Imagine the benefits if one could process terabytes of data on disk without loading the entire data in memory or deserializing it.

Additionally, a significant benefit is interoperability. Such Zero Copy structures help in efficient transfers and processing bytes irrespective of the
application/language/system that processes it.

A fantastic application of this concept that is worth checking out is [Apache Arrow]( https://arrow.apache.org/overview/) — an In memory Columnar format for fast data processing. No matter what language / framework processes data, every program in any language would understand this format. And all of them have the same memory representation of data.

As a reference Wikipedia has a good list of serialization formats that support zero copy operations. Do give some new serialization framework a try and learn more.

Originally published at https://shanmukhsista.com.

--

--

--

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

How To Create a Spring Boot Project

Dynamic Programming, What Is That?, Why? and How?

Spring Dependency Injection Demystified Part 1: Proxying

twitch mods tab not loading

Black Hat Python — Trojans and Github

What is View in SQL || 2 ways to create view SQL

What is View in SQL || 2 ways to create view SQL

Integrating Rails and Devise with JSON Web Tokens: An Introduction

Summary of the Advent of Kotlin🔥

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Shanmukh Sista

Shanmukh Sista

More from Medium

Hot-Reload Webapps on Remote Machine

Working with Synonyms — ElasticSearch

Graph Pattern for Beginners Part-1 with DFS Template

Kafka Streams, Change Data Capture, GoldenGate For Big Data