JSON vs Protocol Buffers — a performance comparison
JSON and Protobuf are two different formats used for data interchange. JSON, which stands for JavaScript Object Notation, is widely used and is the default format for implementing RESTful style APIs.
Protobuf, or Protocol Buffers, is less common and is mainly used with the gRPC framework. This article will compare the two formats based on payload size and encoding/decoding performance, and explore whether the payload’s nature affects these factors.
Methodology
We will perform a test to compare the serialization of identical data structures in JSON using the Jackson library and Protobuf. The payloads we will be testing will have different payload types, one with a string-heavy payload and another with a number-heavy payload. We will also vary the length of the field names.
To conduct the test, we will use the JMH micro benchmark harness framework in Java. The JMH protocol will perform multiple JVM warm-up runs before running the actual benchmark, and will correct the measurements for any statistical anomalies.
The serialized payload will consist of an array of 100 objects, each containing five fields of either strings or numbers. The data structures of the payloads we’ll be using for our test are displayed below.
Number heavy payloads
public class ShortNamesNumbers {
double sn1;
double sn2;
double sn3;
double sn4;
double sn5;
public ShortNamesNumbers() {
}
public ShortNamesNumbers(double sn1, double sn2, double sn3, double sn4, double sn5) {
this.sn1 = sn1;
this.sn2 = sn2;
this.sn3 = sn3;
this.sn4 = sn4;
this.sn5 = sn5;
}
// + all the getters and setters
}
public class LongNamesNumbers {
double longName1;
double longName2;
double longName3;
double longName4;
double longName5;
public LongNamesNumbers() {
}
public LongNamesNumbers(double longName1, double longName2, double longName3, double longName4, double longName5) {
this.longName1 = longName1;
this.longName2 = longName2;
this.longName3 = longName3;
this.longName4 = longName4;
this.longName5 = longName5;
}
// + all the getters and setters
}
message ShortNamesNumbers {
double sn1 = 1;
double sn2 = 2;
double sn3 = 3;
double sn4 = 4;
double sn5 = 5;
}
message ShortNamesNumbersList {
repeated ShortNamesNumbers elements = 1;
}
message LongNamesNumbers {
double longName1 = 1;
double longName2 = 2;
double longName3 = 3;
double longName4 = 4;
double longName5 = 5;
}
message LongNamesNumbersList {
repeated LongNamesNumbers elements = 1;
}
String heavy payloads
public class ShortNamesStrings {
String sn1;
String sn2;
String sn3;
String sn4;
String sn5;
public ShortNamesStrings() {
}
public ShortNamesStrings(String sn1, String sn2, String sn3, String sn4, String sn5) {
this.sn1 = sn1;
this.sn2 = sn2;
this.sn3 = sn3;
this.sn4 = sn4;
this.sn5 = sn5;
}
// + all the getters and setters
}
public class LongNamesStrings {
String longName1;
String longName2;
String longName3;
String longName4;
String longName5;
public LongNamesStrings() {
}
public LongNamesStrings(String longName1, String longName2, String longName3, String longName4, String longName5) {
this.longName1 = longName1;
this.longName2 = longName2;
this.longName3 = longName3;
this.longName4 = longName4;
this.longName5 = longName5;
}
// + all the getters and setters
}
message ShortNamesStrings {
string sn1 = 1;
string sn2 = 2;
string sn3 = 3;
string sn4 = 4;
string sn5 = 5;
}
message ShortNamesStringsList {
repeated ShortNamesStrings elements = 1;
}
message ShortNamesStrings {
string sn1 = 1;
string sn2 = 2;
string sn3 = 3;
string sn4 = 4;
string sn5 = 5;
}
message ShortNamesStringsList {
repeated ShortNamesStrings elements = 1;
}
Protocol
To ensure that we measure only serialization and deserialization, we constructed the payloads in “State” classes, and we used a Blackhole instance to consume the serialization/deserialization results so that the JVM does not drop the execution of code that produces a result which is not used. The payloads are serialized and deserialized to and from memory buffers to avoid involving I/O operations. This is important because interacting with I/O devices may introduce unpredictable latency and skew test results.
We also took steps to eliminate other potential sources of entropy during the testing. We stopped all unnecessary processes running on the computer and configured power management to prevent CPU throttling or screen saver activation during the benchmarks.
To reduce the statistical margin of error in the results, we configured JMH with 10 warmup iterations and 20 run iterations for each experiment.
The source code for this experiment is available on GitHub at https://github.com/entzik/json-proto-jmh-benchmark.
Results
The raw JMH run report shows Protocol Buffers bellow is significantly faster than JSON:
Benchmark Mode Cnt Score Error Units
CodecBenchmark.benchmarkJsonDecodeLongNamesNumbers thrpt 20 16957.714 ± 35.877 ops/s
CodecBenchmark.benchmarkJsonDecodeLongNamesStrings thrpt 20 31819.991 ± 22.943 ops/s
CodecBenchmark.benchmarkJsonDecodeShortNamesNumbers thrpt 20 17277.170 ± 35.430 ops/s
CodecBenchmark.benchmarkJsonDecodeShortNamesStrings thrpt 20 32346.328 ± 130.468 ops/s
CodecBenchmark.benchmarkJsonEncodeLongNamesNumbers thrpt 20 23763.468 ± 40.468 ops/s
CodecBenchmark.benchmarkJsonEncodeLongNamesStrings thrpt 20 68919.743 ± 277.151 ops/s
CodecBenchmark.benchmarkJsonEncodeShortNamesNumbers thrpt 20 23987.768 ± 19.126 ops/s
CodecBenchmark.benchmarkJsonEncodeShortNamesStrings thrpt 20 69156.867 ± 140.704 ops/s
CodecBenchmark.benchmarkProtoDecodeLongNamesNumbers thrpt 20 639524.868 ± 752.181 ops/s
CodecBenchmark.benchmarkProtoDecodeLongNamesStrings thrpt 20 726800.119 ± 16395.402 ops/s
CodecBenchmark.benchmarkProtoDecodeShortNamesNumbers thrpt 20 640153.820 ± 655.052 ops/s
CodecBenchmark.benchmarkProtoDecodeShortNamesStrings thrpt 20 730835.859 ± 20201.826 ops/s
CodecBenchmark.benchmarkProtoEncodeLongNamesNumbers thrpt 20 817332.157 ± 413.905 ops/s
CodecBenchmark.benchmarkProtoEncodeLongNamesStrings thrpt 20 180108.769 ± 997.701 ops/s
CodecBenchmark.benchmarkProtoEncodeShortNamesNumbers thrpt 20 819602.488 ± 694.377 ops/s
CodecBenchmark.benchmarkProtoEncodeShortNamesStrings thrpt 20 185876.296 ± 918.987 ops/s
The difference can be more easily visualized in the charts bellow.
Encoding
Let’s look at payload encoding first. Protocol Buffers is ~2.7 times faster than JSON when encoding string heavy payloads but it really shines at encoding numerical values: Protobuf is ~34 times faster than JSON when encoding numbers heavy payloads.
The difference in performance between JSON and Protobuf can be easily explained. For numbers, where the difference is significant, JSON has to convert each number into a string, which is a highly inefficient operation in terms of CPU cycles. Protobuf, on the other hand, simply copies the bytes that represent the number as they are, which is a much more efficient operation. On big-endian architectures, there is a small exception where the numbers need to be converted to little-endian format used by Protobuf, but this is not a major issue because it only involves changing the byte order.
For strings, the difference is less significant because both JSON and Protobuf copy the strings as they are. However, the performance difference can be explained by the fact that JSON has to repeat the field name in front of every value, which is more verbose and uses more CPU cycles.
We noticed that the difference in throughput between having long or short field names in JSON was not significant, and the difference in Protobuf was negligible. This was expected because Protobuf does not include field names in the payload, but instead uses numerical tags specified in the schema.
Decoding
No let’s have a look at decoding performance.
Protocol Buffers is ~23 times faster than JSON when decoding string heavy payloads. Again, Protobuf shines when decoding numerical values: Protobuf is ~38 times faster than JSON when encoding decoding numbers heavy payloads.
The performance difference between JSON and Protobuf for number heavy payloads is due to the fact that JSON requires the numbers to be parsed, which is a very resource-intensive operation, while Protobuf just copies the bytes around.
For string heavy payloads, the performance difference is less pronounced, but Protobuf is still significantly faster than JSON. Although strings are copied as they are, JSON requires a fair amount of parsing for both field names and values.
The difference in throughput between long and short field names is negligible for decoding.
Payload Size
As anticipated, the size of JSON payloads is larger compared to equivalent Protobuf payloads. This is mainly due to the fact that JSON is a text-based, verbose format, while Protobuf is a binary and concise format.
We notice that the size of JSON payloads varies significantly with the length of fields names. This is expected, since field names are repeated in the JSON content.
Conclusions
After analyzing the test results, it’s evident that Protobuf is superior to JSON for machine-to-machine communication in every aspect. It outperforms JSON in terms of time and space complexity for both serialization and deserialization, and also provides type safety and greater expressiveness.
However, JSON has its advantages as a human-friendly format. Its text-based nature enables easy editing of payloads by humans, and it has native support in browser tooling.