JSON vs Protocol Buffers — a performance comparison

7 min readApr 28, 2023

JSON and Protobuf are two different formats used for data interchange. JSON, which stands for JavaScript Object Notation, is widely used and is the default format for implementing RESTful style APIs.

Protobuf, or Protocol Buffers, is less common and is mainly used with the gRPC framework. This article will compare the two formats based on payload size and encoding/decoding performance, and explore whether the payload’s nature affects these factors.

Methodology

We will perform a test to compare the serialization of identical data structures in JSON using the Jackson library and Protobuf. The payloads we will be testing will have different payload types, one with a string-heavy payload and another with a number-heavy payload. We will also vary the length of the field names.

To conduct the test, we will use the JMH micro benchmark harness framework in Java. The JMH protocol will perform multiple JVM warm-up runs before running the actual benchmark, and will correct the measurements for any statistical anomalies.

The serialized payload will consist of an array of 100 objects, each containing five fields of either strings or numbers. The data structures of the payloads we’ll be using for our test are displayed below.

Number heavy payloads

public class ShortNamesNumbers {
    double sn1;
    double sn2;
    double sn3;
    double sn4;
    double sn5;

    public ShortNamesNumbers() {
    }

    public ShortNamesNumbers(double sn1, double sn2, double sn3, double sn4, double sn5) {
        this.sn1 = sn1;
        this.sn2 = sn2;
        this.sn3 = sn3;
        this.sn4 = sn4;
        this.sn5 = sn5;
    }

    // + all the getters and setters
}

public class LongNamesNumbers {
    double longName1;
    double longName2;
    double longName3;
    double longName4;
    double longName5;

    public LongNamesNumbers() {
    }

    public LongNamesNumbers(double longName1, double longName2, double longName3, double longName4, double longName5) {
        this.longName1 = longName1;
        this.longName2 = longName2;
        this.longName3 = longName3;
        this.longName4 = longName4;
        this.longName5 = longName5;
    }

    // + all the getters and setters
}

message ShortNamesNumbers {
  double sn1 = 1;
  double sn2 = 2;
  double sn3 = 3;
  double sn4 = 4;
  double sn5 = 5;
}

message ShortNamesNumbersList {
  repeated ShortNamesNumbers elements = 1;
}

message LongNamesNumbers {
  double longName1 = 1;
  double longName2 = 2;
  double longName3 = 3;
  double longName4 = 4;
  double longName5 = 5;
}

message LongNamesNumbersList {
  repeated LongNamesNumbers elements = 1;
}

String heavy payloads

public class ShortNamesStrings {
    String sn1;
    String sn2;
    String sn3;
    String sn4;
    String sn5;

    public ShortNamesStrings() {
    }

    public ShortNamesStrings(String sn1, String sn2, String sn3, String sn4, String sn5) {
        this.sn1 = sn1;
        this.sn2 = sn2;
        this.sn3 = sn3;
        this.sn4 = sn4;
        this.sn5 = sn5;
    }

    // + all the getters and setters
}

public class LongNamesStrings {
    String longName1;
    String longName2;
    String longName3;
    String longName4;
    String longName5;

    public LongNamesStrings() {
    }

    public LongNamesStrings(String longName1, String longName2, String longName3, String longName4, String longName5) {
        this.longName1 = longName1;
        this.longName2 = longName2;
        this.longName3 = longName3;
        this.longName4 = longName4;
        this.longName5 = longName5;
    }

    // + all the getters and setters
}

message ShortNamesStrings {
  string sn1 = 1;
  string sn2 = 2;
  string sn3 = 3;
  string sn4 = 4;
  string sn5 = 5;
}

message ShortNamesStringsList {
  repeated ShortNamesStrings elements = 1;
}

message ShortNamesStrings {
  string sn1 = 1;
  string sn2 = 2;
  string sn3 = 3;
  string sn4 = 4;
  string sn5 = 5;
}

message ShortNamesStringsList {
  repeated ShortNamesStrings elements = 1;
}

Protocol

To ensure that we measure only serialization and deserialization, we constructed the payloads in “State” classes, and we used a Blackhole instance to consume the serialization/deserialization results so that the JVM does not drop the execution of code that produces a result which is not used. The payloads are serialized and deserialized to and from memory buffers to avoid involving I/O operations. This is important because interacting with I/O devices may introduce unpredictable latency and skew test results.

We also took steps to eliminate other potential sources of entropy during the testing. We stopped all unnecessary processes running on the computer and configured power management to prevent CPU throttling or screen saver activation during the benchmarks.

To reduce the statistical margin of error in the results, we configured JMH with 10 warmup iterations and 20 run iterations for each experiment.

The source code for this experiment is available on GitHub at https://github.com/entzik/json-proto-jmh-benchmark.

Results

The raw JMH run report shows Protocol Buffers bellow is significantly faster than JSON:

Benchmark                                                  Mode  Cnt       Score       Error  Units
CodecBenchmark.benchmarkJsonDecodeLongNamesNumbers        thrpt   20   16957.714 ±    35.877  ops/s
CodecBenchmark.benchmarkJsonDecodeLongNamesStrings        thrpt   20   31819.991 ±    22.943  ops/s
CodecBenchmark.benchmarkJsonDecodeShortNamesNumbers       thrpt   20   17277.170 ±    35.430  ops/s
CodecBenchmark.benchmarkJsonDecodeShortNamesStrings       thrpt   20   32346.328 ±   130.468  ops/s
CodecBenchmark.benchmarkJsonEncodeLongNamesNumbers        thrpt   20   23763.468 ±    40.468  ops/s
CodecBenchmark.benchmarkJsonEncodeLongNamesStrings        thrpt   20   68919.743 ±   277.151  ops/s
CodecBenchmark.benchmarkJsonEncodeShortNamesNumbers       thrpt   20   23987.768 ±    19.126  ops/s
CodecBenchmark.benchmarkJsonEncodeShortNamesStrings       thrpt   20   69156.867 ±   140.704  ops/s
CodecBenchmark.benchmarkProtoDecodeLongNamesNumbers       thrpt   20  639524.868 ±   752.181  ops/s
CodecBenchmark.benchmarkProtoDecodeLongNamesStrings       thrpt   20  726800.119 ± 16395.402  ops/s
CodecBenchmark.benchmarkProtoDecodeShortNamesNumbers      thrpt   20  640153.820 ±   655.052  ops/s
CodecBenchmark.benchmarkProtoDecodeShortNamesStrings      thrpt   20  730835.859 ± 20201.826  ops/s
CodecBenchmark.benchmarkProtoEncodeLongNamesNumbers       thrpt   20  817332.157 ±   413.905  ops/s
CodecBenchmark.benchmarkProtoEncodeLongNamesStrings       thrpt   20  180108.769 ±   997.701  ops/s
CodecBenchmark.benchmarkProtoEncodeShortNamesNumbers      thrpt   20  819602.488 ±   694.377  ops/s
CodecBenchmark.benchmarkProtoEncodeShortNamesStrings      thrpt   20  185876.296 ±   918.987  ops/s

The difference can be more easily visualized in the charts bellow.

Encoding

Let’s look at payload encoding first. Protocol Buffers is ~2.7 times faster than JSON when encoding string heavy payloads but it really shines at encoding numerical values: Protobuf is ~34 times faster than JSON when encoding numbers heavy payloads.

The difference in performance between JSON and Protobuf can be easily explained. For numbers, where the difference is significant, JSON has to convert each number into a string, which is a highly inefficient operation in terms of CPU cycles. Protobuf, on the other hand, simply copies the bytes that represent the number as they are, which is a much more efficient operation. On big-endian architectures, there is a small exception where the numbers need to be converted to little-endian format used by Protobuf, but this is not a major issue because it only involves changing the byte order.

For strings, the difference is less significant because both JSON and Protobuf copy the strings as they are. However, the performance difference can be explained by the fact that JSON has to repeat the field name in front of every value, which is more verbose and uses more CPU cycles.

We noticed that the difference in throughput between having long or short field names in JSON was not significant, and the difference in Protobuf was negligible. This was expected because Protobuf does not include field names in the payload, but instead uses numerical tags specified in the schema.

Decoding

No let’s have a look at decoding performance.

Protocol Buffers is ~23 times faster than JSON when decoding string heavy payloads. Again, Protobuf shines when decoding numerical values: Protobuf is ~38 times faster than JSON when encoding decoding numbers heavy payloads.

The performance difference between JSON and Protobuf for number heavy payloads is due to the fact that JSON requires the numbers to be parsed, which is a very resource-intensive operation, while Protobuf just copies the bytes around.

For string heavy payloads, the performance difference is less pronounced, but Protobuf is still significantly faster than JSON. Although strings are copied as they are, JSON requires a fair amount of parsing for both field names and values.

The difference in throughput between long and short field names is negligible for decoding.

Payload Size

As anticipated, the size of JSON payloads is larger compared to equivalent Protobuf payloads. This is mainly due to the fact that JSON is a text-based, verbose format, while Protobuf is a binary and concise format.

We notice that the size of JSON payloads varies significantly with the length of fields names. This is expected, since field names are repeated in the JSON content.

Conclusions

After analyzing the test results, it’s evident that Protobuf is superior to JSON for machine-to-machine communication in every aspect. It outperforms JSON in terms of time and space complexity for both serialization and deserialization, and also provides type safety and greater expressiveness.

However, JSON has its advantages as a human-friendly format. Its text-based nature enables easy editing of payloads by humans, and it has native support in browser tooling.