Generate B64E MessagePack that was compressed by Zlib


ASTER::EXPRESSION::JSON::Retrive the serialized binary (MessagePack) data Zlib-compressed.

Convert the JSON being edited in ASTER to MessagePack, encode the binary in Base64, and receive it as a string in CF25.

JSON is a text-based data format, but MessagePack is a format that allows handling it in binary form.

The conversion to MessagePack uses nlohmann/json’s to_msgpack.

Relevant URL :

This function in ASTER converts the currently edited JSON in memory to MessagePack. Then, the converted binary is compressed using the Zlib method. After that, the compressed binary is encoded with Base64, and the serialized result is received.

The process follows these steps.

{
  "name":"ASTER",
  "version":1
}
---
title: Zlib Data Compression & Encoding Serialization Workflow
---
graph TD;
  subgraph JSON_Data
    A1["name: 'ASTER'"]
    A2["version: 1"]
  end

  JSON_Data -->|convert to MsgPack| B[Binary Data]
  B -->|Zlib compression| C[Base64-encoded]
  C --> |"8AaCpG5hbWWlQVNURVKndmVyc2lvbgE="| F[CF25 expression: String]
8AaCpG5hbWWlQVNURVKndmVyc2lvbgE=

Parameter. Noone


Notes.1

Specification of MessagePack and JSON from the Perspective of Data Size

Generally, even without compression, converting JSON to MessagePack results in a smaller data size compared to JSON.

Data that contains a large number of numerical values benefits from binary representation, allowing for efficient reduction of data size.

For example, in nlohmann/json, there is a default setting when storing numbers in JSON, where all floating-point numbers are serialized as Double type strings.

Relevant URLs
/// a class to store JSON values
/// @sa https://json.nlohmann.me/api/basic_json/
template<template<typename U, typename V, typename... Args> class ObjectType =
         std::map,
         template<typename U, typename... Args> class ArrayType = std::vector,
         class StringType = std::string, class BooleanType = bool,
         class NumberIntegerType = std::int64_t,
         class NumberUnsignedType = std::uint64_t,
         class NumberFloatType = double,// <-------------------------------------------:
         template<typename U> class AllocatorType = std::allocator,
         template<typename T, typename SFINAE = void> class JSONSerializer =
         adl_serializer,
         class BinaryType = std::vector<std::uint8_t>, // cppcheck-suppress syntaxError
         class CustomBaseClass = void>
class basic_json;

MessagePack stores numerical data in the most optimal format—whether as integers or floating-point numbers—when converting to binary. This alone helps reduce data size.

JSON inherently includes essential characters such as " (double quotes), : (colon), and {} (curly brackets) to define its data structure. MessagePack replaces these structural elements with binary identifiers and tags, effectively reducing redundant notation.


Notes.2

Data Size and Base64 Encoding

After converting to MessagePack, compressing the binary is more efficient than directly compressing text data.

  • Extremely simple data may increase in size when compressed.

Base64 encoding (B64E) is widely used for serializing binary data, but it increases the data size by about 33%. (Base64 converts 3 bytes of binary data into a 4-byte text representation.)

  • 100 KB of binary data → After B64E, about 133 KB

  • If LZ4 compression reduces the data size by 40%, it is still possible to achieve an overall reduction even after Base64 encoding.

  • Zlib achieves a higher compression ratio, but the compression and decompression processing overhead is significantly higher than LZ4.

Compression Method Original Size (bytes) Compressed & B64E Size (bytes)
LZ4 12,386 5,114
Zlib 12,386 3,654

In the current ASTER configuration, LZ4 compressed data serialized using B64E is 28.55% larger compared to B64E data compressed with Zlib. The difference in data size will later impact the processing speed of Base64.


Notes.3

Log Data Analysis and Comparison Results

In the test environment, Zlib compression takes approximately 20 (19.8) times longer than LZ4.

Since LZ4 has a lower compression ratio, the compressed data size is larger compared to Zlib. Due to the increased data size, the processing time for B64E is approximately 35 % longer compared to Zlib.

Base64 decoding (B64D) is also affected by data size, with LZ4-compressed data requiring approximately 42.68 % more processing time.

However, during the subsequent decompression process, LZ4 is about three times faster than Zlib. Although Base64 processing becomes a bottleneck, overall, processing using LZ4 is conclusively much faster than Zlib.


Speed Comparison Cmp & Dcmp / B64E & B64D

Target Data Size: 12,386 bytes
  • LZ4 + B64E : Data size = 5,114 bytes ( 28.55% larger than Zlib+B64E )
  • Zlib + B64E : Data size = 3,654 bytes
Processing Method Average Processing Time (ms) Comparison Speed
LZ4 Compression 11.57 ~20x faster vs. Zlib
LZ4 Decompression 9.53 ~3x faster vs. Zlib
Zlib Compression 232.70 -
Zlib Decompression 32.88 -
B64E: LZ4 Cmp 9.42 34.57% drop in speed (Zlib)
B64D: LZ4 Cmp 10.30 42.68% drop in speed (Zlib)
B64E: Zlib Cmp 7.00 -
B64D: Zlib Cmp 7.22 -
LZ4 Cmp + B64E 20.99 ~11x faster vs. Zlib
LZ4 Dcmp + B64D 19.83 ~2x faster vs. Zlib
Zlib Cmp + B64E 239.70 -
Zlib Dcmp + B64D 40.10 -

Summary

The processing time of Base64 increases in proportion to the data size. In LZ4 compressed data, Base64 is the bottleneck.

  • Base64 will not become faster unless SIMD instructions are implemented.
  • LZ4 compression is overwhelmingly fast, but the impact of Base64 processing narrows the speed difference.
  • Zlib compression requires 20 times the processing time of LZ4 compression.
  • Zlib is not extremely slow; its usefulness depends on the application.