SeqProto: fast binary serialization in JavaScript
Tommaso Allevi
Algorithms
4.5
min read
Dec 15, 2023
In our digital world, efficient data management is essential due to the crucial role of data. Serialization and deserialization, specifically in JavaScript, can be challenging. While JSON is user-friendly, it is too generic for machines, leading to performance issues.
Seqproto is a library designed to address these problems. It offers a fast and efficient method for serializing and deserializing JavaScript objects.
The reason behind SeqProto
At Orama, we process large amounts of data in a highly constrained environment. Specifically, Cloudflare workers face certain limitations. For example, they can only use a total of 128MB of memory. This might seem substantial, but when considering strings and objects, it is actually quite limited.
Distributed systems are generally slower than monoliths due to the time-consuming network interactions caused by serialization/deserialization and transport time (latency). As the network stack is unavoidable, focus was placed on reducing the time spent on serialization/deserialization and the payload length. By minimizing these, the number of requests per second will increase.
In our context, we sought to identify serialization formats that are both quicker and provide less payload. Initially, we considered Avro, Protobuf, CBOR, and MsgPack. When compared with JSON, the results were quite significant: Avro and Protobuf notably outperformed JSON. This can be attributed to Avro and Protobuf's utilization of knowledge of input structure to enhance performance.
With our performance-oriented goals in mind, we wondered if there was a way to outperform. The answer was YES, and this led to the creation of SeqProto.
Orama implemented SeqProto to function effectively in constrained environments with limited memory and critical time constraints. It's particularly useful for internal APIs crucial in distributed systems, especially when handling large data volumes. Orama uses SeqProto for APIs that could return over 10,000 elements.
Talk is cheap, show me the code
To make some valid examples, we can consider a Todo structure like the following one:
We would like to serialize an array of Todo
, so we already know the output. Consider the below todos:
We can serialize it in the following way:
The first row describes what is contained every 4 bytes. The second represents the array buffer reading 4 bytes at a time. The third row represents the data as bytes.
So, considering the second row:
1
: the array length. We are serializing just one element in the array44
: the id33
: the userId0
: the boolean completed is converted to integer (1 means true, 0 means false)7
: the length of the text544173926
: is102 256 ^ 0 + 111 256 ^ 1 + 111 256 ^ 2 + 32 256 ^ 3
, where102
is the utf-8 representation of“f”
,111
, is“o”
and32
is the space“ “
.7496034
: is98 256 ^ 0 + 97 256 ^ 1 + 114 256 ^ 2 + 0 256 ^ 3
, where98
is“b”
,97
is“a”
and114
is“r”
.
In code:
I put the above code into a Fastify server to test the performance. Below is the result of the test:
Here, you can find more analysis on that.
Advantages of Using SeqProto
Seqproto excels in speed and efficiency, outperforming other libraries. It takes advantage of a pure JavaScript structure, such as ArrayBuffer, and its views, making serialization quicker than other formats.
It also uses procedural serialization and deserialization, allowing for special case handlings, such as nullable properties, enumerations, and custom structures.
Additionally, its straightforward APIs ensure easy implementation.
Give it a star!
If you enjoyed this article on SeqProto, make sure to check out the source code and give it a star on GitHub at https://github.com/oramasearch/seqproto!
Conclusion
JSON is a good format for serialization and deserialization. It is human-readable, and this counts a lot at the beginning, so it is a good starting point. Anyway, JSON limits performance, so when performance counts, Seqproto is a powerful tool to improve performance.