Use ByteBuffer's for serialization/deserialization. #356

jameshilliard · 2020-11-04T07:53:47Z

Some more groundwork for #347.

scottf · 2020-11-08T05:51:11Z

src/main/java/io/nats/client/impl/NatsConnection.java

+    static final ByteBuffer OP_PING = ByteBuffer.wrap(new byte[] { 'P', 'I', 'N', 'G' });
+    static final ByteBuffer OP_PONG = ByteBuffer.wrap(new byte[] { 'P', 'O', 'N', 'G' });
+    static final ByteBuffer OP_OK = ByteBuffer.wrap(new byte[] { '+', 'O', 'K' });
+    static final ByteBuffer OP_ERR = ByteBuffer.wrap(new byte[] { '-', 'E', 'R', 'R' });


I don't know important it is to squeeze every bit of performance while initializing, and I don't know how much difference there would be if any, but it might be more readable to just do:

static final ByteBuffer OP_ERR = ByteBuffer.wrap("-ERR".getBytes());

I did it this way mostly since it's pretty obvious from the variables names what they are, and since I'm wasn't sure there wouldn't be any unexpected performance issues due to allocation weirdness when calling getBytes(). I've seen instances where bytebuffers get sized weirdly when converting from strings sometimes since Java strings are essentially UTF-16 and not ASCII/UTF-8(which is what we need here), I figured this version was more obviously correct since it wouldn't be prone to those side effects as it entirely avoids Java UTF-16 formatted strings.

I see. Better safe than sorry!

Yeah, I've tried to fairly aggressively minimize the usage of strings wherever feasible, especially in the lower level serialization/de-serialization code as string conversions are relatively expensive operations that need to be aggressively minimized for performance. Strings are also just difficult to work with in general with lower level Java networking code since they make accurate buffer sizing difficult, hence why I'm trying to have everything immediately normalized to ByteBuffer's internally as soon as feasible.

scottf · 2020-11-08T22:06:35Z

src/main/java/io/nats/client/impl/NatsMessage.java

@@ -34,36 +33,25 @@
    NatsMessage next; // for linked list

    // Create a message to publish
-    NatsMessage(String subject, String replyTo, ByteBuffer data, boolean utf8mode) {
+    NatsMessage(ByteBuffer subject, ByteBuffer replyTo, ByteBuffer data, boolean utf8mode) {


Since everything is already in buffers by here, can utf8mode flag be completely removed from NatsMessage?

Probably, I left it there since I wasn't sure if this was a public API or not though.

src/main/java/io/nats/client/NUID.java

src/main/java/io/nats/client/Subscription.java

scottf · 2020-11-10T04:44:06Z

So while I was a waiting for this to be resolved since my work directly depends on it, I took the time to build a rudimentary benchmark. My concern was the buffer change adds complexity that outweighs the benefits. I believe the test support that opinion.

The test was new-ing 3 NatsMessage which just basically does a little string or buffer manipulation. I don't do anything with the message except create it. The 3 creates were 1) a protocol only message, 2) a publish message without utf8 mode and 3) a publish method with utf8mode. I do that because the String version has slightly different pathway with the publish modes. The buffer version does an encoding both modes. The string version uses builders as I had already written that code.

I ran those tests 2 billion times each. On my laptop. I made sure nothing else was running. I tried to write the code to time only create code. It's not perfect, but I think it supports my point. In this case, buffers actually seem slower. The raw numbers were :

String version:  801327 millis = 13.35545 minutes
Buffers version: 983311 millis = 16.38857 minutes

The buffers version of the benchmark:

public static void main(String[] args) {
    int stop = 2_000_000_000;
    int print = 500_000;
    long total = 0;
    for (int x = 0; x < stop; x++) {
        if ((x % print) == 0) System.out.println(x);
        String data = "" + x;
        byte[] body = (data).getBytes();
        String subject = "subj" + x;
        String replyTo = "reply" + x;

        long start = System.currentTimeMillis();
        new NatsMessage(CharBuffer.wrap(data));
        new NatsMessage(subject, replyTo, body, true);
        new NatsMessage(subject, replyTo, body, false);
        total += (System.currentTimeMillis() - start);
    }
    System.out.println("\n" + total);
}

The string/builders version:

public static void main(String[] args) {
    int stop = 2_000_000_000;
    int print = 500_000;
    long total = 0;
    for (int x = 0; x < stop; x++) {
        if ((x % print) == 0) System.out.println(x);
        String data = "" + x;
        byte[] body = (data).getBytes();
        String subject = "subj" + x;
        String replyTo = "reply" + x;

        long start = System.currentTimeMillis();
        new NatsMessage.Builder().protocol(data).build();
        new NatsMessage.Builder().subject(subject).replyTo(replyTo).data(body).utf8mode(true).build();
        new NatsMessage.Builder().subject(subject).replyTo(replyTo).data(body).build();
        total += (System.currentTimeMillis() - start);
    }
    System.out.println("\n" + total);
}

jameshilliard · 2020-11-10T04:54:06Z

I tried to write the code to time only create code. It's not perfect, but I think it supports my point. In this case, buffers actually seem slower.

That's expected because I moved a lot of serialization/format conversions out of NatsConnectionReader to NatsMessage, so it's not at all surprising your benchmark for NatsMessage is slower than before, the NatsConnectionReader and many other parts of the code should be significantly faster however since NatsMessage is essentially where the string encoding/decoding now happens.

scottf · 2020-11-10T05:01:31Z

Yes, agreed. The test is far from perfect and does not take into account all your changes. So maybe we should build a better benchmark. But it still seems to support not adding complexity until we know it's really worth it.

jameshilliard · 2020-11-10T05:06:51Z

But it still seems to support not adding complexity until we know it's really worth it.

This while somewhat more verbose does make it significantly easier to size buffers properly, this is rather important for optimizing a lot of the low level message handling code, it also allows us to eliminate redundant codepaths in NatsConnectionReader as we can now effectively operate on raw UTF8/ASCII bytes instead of UTF-16 strings(which are really annoying to deal with when serializing to UTF-8/ASCII since there's not an efficient way to determine the appropriate buffer sizes from UTF-16 strings).

jameshilliard · 2020-11-10T05:08:31Z

src/main/java/io/nats/client/impl/NatsConnectionReader.java

-                                this.gatherProtocol(bytesRead);
-                            } else {
-                                this.gatherMessageProtocol(bytesRead);
-                            }


This is eliminated.

jameshilliard · 2020-11-10T05:09:03Z

src/main/java/io/nats/client/impl/NatsConnectionReader.java

-                    CharBuffer buff = StandardCharsets.UTF_8.decode(protocolBuffer);
-                    protocolLength = buff.remaining();
-                    buff.get(this.msgLineChars, 0, protocolLength);
-                }


This as well.

jameshilliard · 2020-11-10T05:10:22Z

src/main/java/io/nats/client/impl/NatsConnectionReader.java

-            }
-        } else {
-            return UNKNOWN_OP;
-        }


Also was able to eliminate this scary function.

scottf · 2020-11-10T13:26:56Z

So let's pretend we fix the benchmark I wrote to consider the new code better. And lets say instead of being 3 minutes slower, the buffer version now runs X minutes faster.
That's X * 60 * 1000 milliseconds faster. Remember the test did 3 messages each of 2 billion rounds. So that's 6 billion messages. If we look at what happens if the new test runs 10/100/1000 minutes faster, we can see how much gain per message we are getting. For our use case, is even 1_000 nanos improvement per message worth the complexity?

Minutes = Millis     / Total         = Gain Per  | 
Faster  = Faster     / Messages      = Message   | Fraction    | Nanos
------- = ---------- / ------------- = --------- | ----------- | ------------
   10   =    600_000 / 6_000_000_000 = 0.0001 ms | 1/10_000 ms |    100 nanos
  100   =  6_000_000 / 6_000_000_000 = 0.001 ms  | 1/1000 ms   |  1_000 nanos
 1000   = 60_000_000 / 6_000_000_000 = 0.01 ms   | 1/100 ms    | 10_000 nanos

jameshilliard · 2020-11-10T19:39:27Z

For our use case, is even 1_000 nanos improvement per message worth the complexity?

I don't really consider this to be an overall complexity increase, but more of a complexity shift that moves a large amount of the string message serialization/de-serialization complexity out of lower level layers and into higher layers, the idea is that this significantly decreases complexity around certain areas in order to make further refactoring and optimizations such as using AsynchronousSocketChannel's more straight forward.

Fundamentally at their lowest level network protocols are byte oriented, not string oriented, I can see why this change may look more complex as a lot of the ByteBuffer API's are very low level which result in more verbosity(this is a problem with Java in general and why a lot of the higher level code we write is actually in Kotlin), however the message processing logic itself IMO ends up being significantly simpler and easier to reason about when you can avoid Java's UTF-16 strings entirely since we can split message segments and do buffer allocations directly using the raw socket bytes/ByteBuffer's without any string conversions since the parts we need to interpret are all single byte ASCII characters.

I should also add that one of the goals for this change is to make it possible to add additional public API's that allow for the nats library to be used without any string encoding/decoding operations whatsoever for certain hot-paths, in order to make that possible without a large amount of code duplication we need to move all string encoding/decoding out of the low level connection reader/writer code.

In terms of use case a lot of what I intend to use nats for is for load balanced services that act as protocol converters/proxies, which essentially ingest a nats message, do some conversion operations/filtering and forward the request on to other services that may not speak nats protocol directly. For example this would be used for gateway services that essentially would act as high performance legacy protocol conversion gateway proxies that wrap legacy backend services which in many cases may not be able to be made to easily speak nats protocol directly. In order to efficiently interface with these legacy network protocols in a low latency manner we want to avoid any unnecessary string encoding operations and pass bytes/buffers directly from one network socket to the other with both protocols using non-blocking IO without doing encoding/decoding of any part of the nats message to/from strings. Using a buffer passing approach here should make it significantly easier to heavily optimize for these performance critical codepaths as Java's async non-blocking IO support is largely built around AsynchronousSocketChannel which natively operates on ByteBuffer's.

So let's pretend we fix the benchmark I wrote to consider the new code better.

This set of changes by itself is not really intended to significantly improve performance by itself but rather act as groundwork for further refactoring that will do so, it's probably a bit early to be micro-benching performance as further refactoring will be needed to see a significant benefit. That can be done more incrementally I think while this set of changes was much more difficult to split up as intermixing ByteBuffer based message processing with string based message processing is quite difficult, so I pretty much had to convert a large portion of the internals all at once.

jameshilliard · 2020-11-12T23:55:01Z

I now have SSL working in #357 which is based off of this.

ColinSullivan1 · 2020-11-17T22:29:24Z

I wanted to give an update for this PR. There are many PRs that will need to be merged about the same time, so would like to take a methodical approach to this.

#362 and #354 contain missing functionality that is delaying POCs of JetStream so these should be merged first. At that point I'd like to cut a beta release to unblock these user and then look at byte buffers (this PR and #364).

Optimization is very important to us but it's more important to to unblock our users waiting use JetStream, with a minimum of risk. This PR will get reviewed and merged - we just need to get the headers and the Jetstream functionality in first. I apologize in advance for the rebase that will need to happen, but from a product perspective we need to prioritize this way.

Thanks again for all of this work and please be patient as we move forward with merging these PRs.

jameshilliard · 2020-11-17T22:33:32Z

we just need to get the headers and the Jetstream functionality in first

Yeah, I'm going over and refactoring the headers PR a bit to make the re-base this easier at the moment.

ColinSullivan1 · 2020-11-17T23:06:24Z

FYI - #354 will likely change quite a bit.

jameshilliard · 2020-11-18T00:06:16Z

At that point I'd like to cut a beta release to unblock these user and then look at byte buffers (this PR and #364).

Well #364 should apply cleanly on top of #362 and also fixes some tests/bugs in #362, I would recommend just importing my patches in #364 to the #362 PR branch.

scottf · 2021-04-08T15:37:38Z

I have done quite a bit of work on the client, used bytebuffers in some places, wrapped with ByteArrayBuilders. I have also done some profiling. We have some places in the client where we need to improve performance, but string manipulation is extremely low on the places that profile poorly. Closing this PR

jameshilliard force-pushed the bytebuffer-serialization branch 6 times, most recently from 3b188fa to 17088a6 Compare November 7, 2020 00:02

jameshilliard mentioned this pull request Nov 7, 2020

Initial AsynchronousSocketChannel implementation. #357

Closed

jameshilliard force-pushed the bytebuffer-serialization branch 4 times, most recently from 47678a3 to 7f7c74a Compare November 8, 2020 04:16

scottf reviewed Nov 8, 2020

View reviewed changes

ColinSullivan1 reviewed Nov 9, 2020

View reviewed changes

src/main/java/io/nats/client/NUID.java Outdated Show resolved Hide resolved

ColinSullivan1 reviewed Nov 9, 2020

View reviewed changes

src/main/java/io/nats/client/NUID.java Outdated Show resolved Hide resolved

ColinSullivan1 reviewed Nov 9, 2020

View reviewed changes

src/main/java/io/nats/client/Subscription.java Outdated Show resolved Hide resolved

jameshilliard force-pushed the bytebuffer-serialization branch 3 times, most recently from 083d95c to 9195448 Compare November 10, 2020 04:39

Use ByteBuffer's for serialization/deserialization.

3fe57fe

jameshilliard force-pushed the bytebuffer-serialization branch from 9195448 to 3fe57fe Compare November 10, 2020 04:58

jameshilliard commented Nov 10, 2020

View reviewed changes

jameshilliard mentioned this pull request Nov 13, 2020

Headers object implementation and NatsMessage.Builder #359

Closed

jameshilliard mentioned this pull request Nov 17, 2020

Headers (HPUB) support #362

Merged

scottf closed this Apr 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use ByteBuffer's for serialization/deserialization. #356

Use ByteBuffer's for serialization/deserialization. #356

jameshilliard commented Nov 4, 2020

scottf Nov 8, 2020 •

edited

Loading

jameshilliard Nov 8, 2020

scottf Nov 8, 2020 •

edited

Loading

jameshilliard Nov 8, 2020

scottf Nov 8, 2020

jameshilliard Nov 9, 2020

scottf commented Nov 10, 2020

jameshilliard commented Nov 10, 2020

scottf commented Nov 10, 2020 •

edited

Loading

jameshilliard commented Nov 10, 2020

jameshilliard Nov 10, 2020

jameshilliard Nov 10, 2020

jameshilliard Nov 10, 2020

scottf commented Nov 10, 2020 •

edited

Loading

jameshilliard commented Nov 10, 2020 •

edited

Loading

jameshilliard commented Nov 12, 2020

ColinSullivan1 commented Nov 17, 2020

jameshilliard commented Nov 17, 2020

ColinSullivan1 commented Nov 17, 2020

jameshilliard commented Nov 18, 2020

scottf commented Apr 8, 2021

Use ByteBuffer's for serialization/deserialization. #356

Use ByteBuffer's for serialization/deserialization. #356

Conversation

jameshilliard commented Nov 4, 2020

scottf Nov 8, 2020 • edited Loading

Choose a reason for hiding this comment

jameshilliard Nov 8, 2020

Choose a reason for hiding this comment

scottf Nov 8, 2020 • edited Loading

Choose a reason for hiding this comment

jameshilliard Nov 8, 2020

Choose a reason for hiding this comment

scottf Nov 8, 2020

Choose a reason for hiding this comment

jameshilliard Nov 9, 2020

Choose a reason for hiding this comment

scottf commented Nov 10, 2020

jameshilliard commented Nov 10, 2020

scottf commented Nov 10, 2020 • edited Loading

jameshilliard commented Nov 10, 2020

jameshilliard Nov 10, 2020

Choose a reason for hiding this comment

jameshilliard Nov 10, 2020

Choose a reason for hiding this comment

jameshilliard Nov 10, 2020

Choose a reason for hiding this comment

scottf commented Nov 10, 2020 • edited Loading

jameshilliard commented Nov 10, 2020 • edited Loading

jameshilliard commented Nov 12, 2020

ColinSullivan1 commented Nov 17, 2020

jameshilliard commented Nov 17, 2020

ColinSullivan1 commented Nov 17, 2020

jameshilliard commented Nov 18, 2020

scottf commented Apr 8, 2021

scottf Nov 8, 2020 •

edited

Loading

scottf Nov 8, 2020 •

edited

Loading

scottf commented Nov 10, 2020 •

edited

Loading

scottf commented Nov 10, 2020 •

edited

Loading

jameshilliard commented Nov 10, 2020 •

edited

Loading