Serialization Languages - How we communicate with networks and systems

When I started to explore the functionality of APIs, I ran into both JSON and XML as tools for using APIs. At first, I thought both were programming languages but that was an incorrect assumption.

JSON and XML are not programming languages but communication tools. JSON and XML are mostly used for APIs but there are other communication tools and all have their pros and cons. In this blog, I explore the use of communication tools and I explain in which situations, these communication tools are most suitable.

Communication tools

When we ask computers to process things, they practice the loading of data. This data loading is crucial because it’s used by the CPU (Central Processing Unit) to read it in the memory of the computer. In the process to do this, it stores numbers to read: 0s or 1s. More details about the working of CPUs can be found in this post.

There are countless systems and networks available worldwide which means that issues might happen if they all need a different way of communication between our computers. If I use the metaphor “language” in which I am the computer and someone in China is the network, I need to talk Chinese so that person will understand me. If I learn Chinese, it’s not a problem but if I then want to communicate with someone in Russia I also have to learn Russian. In short: I need to learn all languages to be able to communicate with every person around the globe. The same is the case for communication with all systems across the globe.

To make things easier and prevent miscommunication, common terminology between the systems across the globe is required: a universal translator. There are different universal translators available and all have their pros and cons. The professional term for a universal translator is “data serialization” and the programs used are the data serialization languages.

Serialization and deserialization

Computer data is organized in data structures such as tables, trees, classes, and arrays. When I need to store or transmit a data structure to another location, such as across a network it needs to be serialized because the network will not understand it otherwise.

Serialization allows me to save the state of an object into bytes. When it’s serialized into bytes I can transfer the object to a database, memory, or file. After the transfer, I can recreate the object again as needed, providing storage of objects as well as data exchange. This means the process is reversed: deserialization.

Serialization can be used for:

Persisting data onto files. This happens mostly in language-neutral formats such as CSV or XML. Most languages though, allow objects to be serialized directly into binary using APIs such as the serializable interface in Java, fstream class in C++ or the Pickle module in Python.
Storing data into a Database. When program objects are converted into byte streams and then stored into databases, such as Java JDBC (Java Database Connectivity).
Transferring data through the network. For instance web applications and mobile apps that pass objects from client to server and vice versa.
Remote Method Invocation (RMI). This method passes serialized objects as parameters to functions running on a remote machine as if invoked on a local machine. This data can be transmitted across domains through firewalls.
Sharing data in a Distributed Object Model. When programs are written in different program languages (running on different platforms) need to share object data over a distributed network. SOAP and REST APIs can do this.

Serialization formats

There is no one-size-fits-all serialization format. The best format depends on factors like the type/amount of data that is being serialized and the software that will be reading this. I used Pokémon as an example. My son is crazy about them and it is a very practical example. There are tons of Pokémon and when my son wants to know more about Pokémon, he gets his Pokémon book or checks a Pokémon database with an app on the tablet to get more info about them. In order to obtain data from a Pokémon database through a Pokémon app, you need to communicate in a language that the computer understands. To do so, the app will probably use one of the below (most commonly used) communication tools/serialization formats:

CSV

CSV is a text-based serialization format language and stands for Comma Separated Values and is well suited to store large amounts of tabulated data in a human-readable format. It is not suitable for storing objects or hash tables like data structures (unlike the other serialization formats I discuss in this blog.

The CSV format is well supported with CSV libraries available for almost every popular programming language like C, C++, C#, Java, JavaScript, PHP, Python, Ruby, and Swift.

CSV is also one of the best formats that support spreadsheet programs such as Excel because CSV enforces a tabular structure. A CSV file looks like this:

JSON

JSON is a text-based serialization format language and stands for JavaScript Object Notation and is a ubiquitous human-readable data serialization format that is supported by almost every popular programming language. This formatting language is used a lot for REST APIs (I’ll explain REST APIs in a future post).

JSON looks like this:

JSON supports various programming languages such as C, C++, C#, Java, JavaScript, PHP, Python, Ruby, and Swift.

Protocol Buffers (Protobuf)

Protobuf is a binary serialization protocol and is developed by Google. Because it serializes to binary, it is not human readable. The types of data that Protobuf can contain are well defined and include common types such as strings, integers, floats, and dates. Below is an example. You can see that you can still pick out a few strings when viewing the file as ASCII text as done in this example:

Protobuf supports various programming languages such as C, C++, C#, Java, JavaScript, PHP, Python, Ruby, and Swift.

XML

XML stands for Extensible Markup Language and is a human-readable serialization protocol. HTML is a well-known XML-like format and is used to determine the structure of web pages.

XML has the disadvantage of being very lengthy. Its descriptive end tags require to re-type the name of the element that is being closed. This adds to the byte count of XML data.

XML is very well standardized, with plenty of tooling available to generate XML and validate it with schemas. Below is an example of how XML looks like:

XML supports various programming languages such as C, C++, C#, Java, JavaScript, PHP, Python, Ruby and Swift.

YAML

YAML stands for YAML Ain’t Markup Language and is human-readable. The YAML specification is much larger than the JSON specification. YAML allows for relational data (references) using anchors (`).

The YAML website (https://yaml.org/) is actually in YAML format. Below is also a simple example:

YAML supports various programming languages such as C, C++, C#, Java, JavaScript, PHP, Python, Ruby, and Swift.

BSON

BSON stands for Binary Javascript Object Notification. People also call it “Binary JSON”. It is, as the name says, a binary serialization format. This means that BSON is not human-readable. It is a binary-encoded serialization of JSON documents. BSON has been extended to add some optional non-JSON-native data types, like dates and binary data.

BSON can be compared with other binary formats, like Protocol Buffers (which we previously discussed). The greater difference is that BSON is more “Schema-less” than Protocol Buffers, providing the advantage of flexibility and the slight disadvantage of space efficiency. Below is an example of BSON:

Just like JSON, BSON supports various programming languages such as C, C++, C#, Java, JavaScript, PHP, Python, Ruby, and Swift.

Final thoughts

After my “flaw” of not seeing the difference between a programming language and a serialization format language, I got a clearer picture of the immense world behind Networks and Systems. I never realized there are so many different programming languages and many serialization format languages, used to communicate with them.

Feel free to contact me if you have any questions or if you have any additional advice/tips about this subject. If you want to keep in the loop if I upload a new post, don’t forget to subscribe to receive a notification by e-mail.

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30