Skip to content

Use ByteBuffer's for serialization/deserialization. #356

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

jameshilliard
Copy link
Contributor

Some more groundwork for #347.

@jameshilliard jameshilliard force-pushed the bytebuffer-serialization branch 6 times, most recently from 3b188fa to 17088a6 Compare November 7, 2020 00:02
@jameshilliard jameshilliard force-pushed the bytebuffer-serialization branch 4 times, most recently from 47678a3 to 7f7c74a Compare November 8, 2020 04:16
static final ByteBuffer OP_PING = ByteBuffer.wrap(new byte[] { 'P', 'I', 'N', 'G' });
static final ByteBuffer OP_PONG = ByteBuffer.wrap(new byte[] { 'P', 'O', 'N', 'G' });
static final ByteBuffer OP_OK = ByteBuffer.wrap(new byte[] { '+', 'O', 'K' });
static final ByteBuffer OP_ERR = ByteBuffer.wrap(new byte[] { '-', 'E', 'R', 'R' });
Copy link
Contributor

@scottf scottf Nov 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know important it is to squeeze every bit of performance while initializing, and I don't know how much difference there would be if any, but it might be more readable to just do:

static final ByteBuffer OP_ERR = ByteBuffer.wrap("-ERR".getBytes());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it this way mostly since it's pretty obvious from the variables names what they are, and since I'm wasn't sure there wouldn't be any unexpected performance issues due to allocation weirdness when calling getBytes(). I've seen instances where bytebuffers get sized weirdly when converting from strings sometimes since Java strings are essentially UTF-16 and not ASCII/UTF-8(which is what we need here), I figured this version was more obviously correct since it wouldn't be prone to those side effects as it entirely avoids Java UTF-16 formatted strings.

Copy link
Contributor

@scottf scottf Nov 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Better safe than sorry!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I've tried to fairly aggressively minimize the usage of strings wherever feasible, especially in the lower level serialization/de-serialization code as string conversions are relatively expensive operations that need to be aggressively minimized for performance. Strings are also just difficult to work with in general with lower level Java networking code since they make accurate buffer sizing difficult, hence why I'm trying to have everything immediately normalized to ByteBuffer's internally as soon as feasible.

@@ -34,36 +33,25 @@
NatsMessage next; // for linked list

// Create a message to publish
NatsMessage(String subject, String replyTo, ByteBuffer data, boolean utf8mode) {
NatsMessage(ByteBuffer subject, ByteBuffer replyTo, ByteBuffer data, boolean utf8mode) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since everything is already in buffers by here, can utf8mode flag be completely removed from NatsMessage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, I left it there since I wasn't sure if this was a public API or not though.

@jameshilliard jameshilliard force-pushed the bytebuffer-serialization branch 3 times, most recently from 083d95c to 9195448 Compare November 10, 2020 04:39
@scottf
Copy link
Contributor

scottf commented Nov 10, 2020

So while I was a waiting for this to be resolved since my work directly depends on it, I took the time to build a rudimentary benchmark. My concern was the buffer change adds complexity that outweighs the benefits. I believe the test support that opinion.

The test was new-ing 3 NatsMessage which just basically does a little string or buffer manipulation. I don't do anything with the message except create it. The 3 creates were 1) a protocol only message, 2) a publish message without utf8 mode and 3) a publish method with utf8mode. I do that because the String version has slightly different pathway with the publish modes. The buffer version does an encoding both modes. The string version uses builders as I had already written that code.

I ran those tests 2 billion times each. On my laptop. I made sure nothing else was running. I tried to write the code to time only create code. It's not perfect, but I think it supports my point. In this case, buffers actually seem slower. The raw numbers were :

String version:  801327 millis = 13.35545 minutes
Buffers version: 983311 millis = 16.38857 minutes

The buffers version of the benchmark:

public static void main(String[] args) {
    int stop = 2_000_000_000;
    int print = 500_000;
    long total = 0;
    for (int x = 0; x < stop; x++) {
        if ((x % print) == 0) System.out.println(x);
        String data = "" + x;
        byte[] body = (data).getBytes();
        String subject = "subj" + x;
        String replyTo = "reply" + x;

        long start = System.currentTimeMillis();
        new NatsMessage(CharBuffer.wrap(data));
        new NatsMessage(subject, replyTo, body, true);
        new NatsMessage(subject, replyTo, body, false);
        total += (System.currentTimeMillis() - start);
    }
    System.out.println("\n" + total);
}

The string/builders version:

public static void main(String[] args) {
    int stop = 2_000_000_000;
    int print = 500_000;
    long total = 0;
    for (int x = 0; x < stop; x++) {
        if ((x % print) == 0) System.out.println(x);
        String data = "" + x;
        byte[] body = (data).getBytes();
        String subject = "subj" + x;
        String replyTo = "reply" + x;

        long start = System.currentTimeMillis();
        new NatsMessage.Builder().protocol(data).build();
        new NatsMessage.Builder().subject(subject).replyTo(replyTo).data(body).utf8mode(true).build();
        new NatsMessage.Builder().subject(subject).replyTo(replyTo).data(body).build();
        total += (System.currentTimeMillis() - start);
    }
    System.out.println("\n" + total);
}

@jameshilliard
Copy link
Contributor Author

I tried to write the code to time only create code. It's not perfect, but I think it supports my point. In this case, buffers actually seem slower.

That's expected because I moved a lot of serialization/format conversions out of NatsConnectionReader to NatsMessage, so it's not at all surprising your benchmark for NatsMessage is slower than before, the NatsConnectionReader and many other parts of the code should be significantly faster however since NatsMessage is essentially where the string encoding/decoding now happens.

@jameshilliard jameshilliard force-pushed the bytebuffer-serialization branch from 9195448 to 3fe57fe Compare November 10, 2020 04:58
@scottf
Copy link
Contributor

scottf commented Nov 10, 2020

Yes, agreed. The test is far from perfect and does not take into account all your changes. So maybe we should build a better benchmark. But it still seems to support not adding complexity until we know it's really worth it.

@jameshilliard
Copy link
Contributor Author

But it still seems to support not adding complexity until we know it's really worth it.

This while somewhat more verbose does make it significantly easier to size buffers properly, this is rather important for optimizing a lot of the low level message handling code, it also allows us to eliminate redundant codepaths in NatsConnectionReader as we can now effectively operate on raw UTF8/ASCII bytes instead of UTF-16 strings(which are really annoying to deal with when serializing to UTF-8/ASCII since there's not an efficient way to determine the appropriate buffer sizes from UTF-16 strings).

this.gatherProtocol(bytesRead);
} else {
this.gatherMessageProtocol(bytesRead);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is eliminated.

CharBuffer buff = StandardCharsets.UTF_8.decode(protocolBuffer);
protocolLength = buff.remaining();
buff.get(this.msgLineChars, 0, protocolLength);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This as well.

}
} else {
return UNKNOWN_OP;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also was able to eliminate this scary function.

@scottf
Copy link
Contributor

scottf commented Nov 10, 2020

So let's pretend we fix the benchmark I wrote to consider the new code better. And lets say instead of being 3 minutes slower, the buffer version now runs X minutes faster.
That's X * 60 * 1000 milliseconds faster. Remember the test did 3 messages each of 2 billion rounds. So that's 6 billion messages. If we look at what happens if the new test runs 10/100/1000 minutes faster, we can see how much gain per message we are getting. For our use case, is even 1_000 nanos improvement per message worth the complexity?

Minutes = Millis     / Total         = Gain Per  | 
Faster  = Faster     / Messages      = Message   | Fraction    | Nanos
------- = ---------- / ------------- = --------- | ----------- | ------------
   10   =    600_000 / 6_000_000_000 = 0.0001 ms | 1/10_000 ms |    100 nanos
  100   =  6_000_000 / 6_000_000_000 = 0.001 ms  | 1/1000 ms   |  1_000 nanos
 1000   = 60_000_000 / 6_000_000_000 = 0.01 ms   | 1/100 ms    | 10_000 nanos

@jameshilliard
Copy link
Contributor Author

jameshilliard commented Nov 10, 2020

For our use case, is even 1_000 nanos improvement per message worth the complexity?

I don't really consider this to be an overall complexity increase, but more of a complexity shift that moves a large amount of the string message serialization/de-serialization complexity out of lower level layers and into higher layers, the idea is that this significantly decreases complexity around certain areas in order to make further refactoring and optimizations such as using AsynchronousSocketChannel's more straight forward.

Fundamentally at their lowest level network protocols are byte oriented, not string oriented, I can see why this change may look more complex as a lot of the ByteBuffer API's are very low level which result in more verbosity(this is a problem with Java in general and why a lot of the higher level code we write is actually in Kotlin), however the message processing logic itself IMO ends up being significantly simpler and easier to reason about when you can avoid Java's UTF-16 strings entirely since we can split message segments and do buffer allocations directly using the raw socket bytes/ByteBuffer's without any string conversions since the parts we need to interpret are all single byte ASCII characters.

I should also add that one of the goals for this change is to make it possible to add additional public API's that allow for the nats library to be used without any string encoding/decoding operations whatsoever for certain hot-paths, in order to make that possible without a large amount of code duplication we need to move all string encoding/decoding out of the low level connection reader/writer code.

In terms of use case a lot of what I intend to use nats for is for load balanced services that act as protocol converters/proxies, which essentially ingest a nats message, do some conversion operations/filtering and forward the request on to other services that may not speak nats protocol directly. For example this would be used for gateway services that essentially would act as high performance legacy protocol conversion gateway proxies that wrap legacy backend services which in many cases may not be able to be made to easily speak nats protocol directly. In order to efficiently interface with these legacy network protocols in a low latency manner we want to avoid any unnecessary string encoding operations and pass bytes/buffers directly from one network socket to the other with both protocols using non-blocking IO without doing encoding/decoding of any part of the nats message to/from strings. Using a buffer passing approach here should make it significantly easier to heavily optimize for these performance critical codepaths as Java's async non-blocking IO support is largely built around AsynchronousSocketChannel which natively operates on ByteBuffer's.

So let's pretend we fix the benchmark I wrote to consider the new code better.

This set of changes by itself is not really intended to significantly improve performance by itself but rather act as groundwork for further refactoring that will do so, it's probably a bit early to be micro-benching performance as further refactoring will be needed to see a significant benefit. That can be done more incrementally I think while this set of changes was much more difficult to split up as intermixing ByteBuffer based message processing with string based message processing is quite difficult, so I pretty much had to convert a large portion of the internals all at once.

@jameshilliard
Copy link
Contributor Author

I now have SSL working in #357 which is based off of this.

@ColinSullivan1
Copy link
Member

I wanted to give an update for this PR. There are many PRs that will need to be merged about the same time, so would like to take a methodical approach to this.

#362 and #354 contain missing functionality that is delaying POCs of JetStream so these should be merged first. At that point I'd like to cut a beta release to unblock these user and then look at byte buffers (this PR and #364).

Optimization is very important to us but it's more important to to unblock our users waiting use JetStream, with a minimum of risk. This PR will get reviewed and merged - we just need to get the headers and the Jetstream functionality in first. I apologize in advance for the rebase that will need to happen, but from a product perspective we need to prioritize this way.

Thanks again for all of this work and please be patient as we move forward with merging these PRs.

@jameshilliard
Copy link
Contributor Author

we just need to get the headers and the Jetstream functionality in first

Yeah, I'm going over and refactoring the headers PR a bit to make the re-base this easier at the moment.

@ColinSullivan1
Copy link
Member

FYI - #354 will likely change quite a bit.

@jameshilliard
Copy link
Contributor Author

At that point I'd like to cut a beta release to unblock these user and then look at byte buffers (this PR and #364).

Well #364 should apply cleanly on top of #362 and also fixes some tests/bugs in #362, I would recommend just importing my patches in #364 to the #362 PR branch.

@scottf
Copy link
Contributor

scottf commented Apr 8, 2021

I have done quite a bit of work on the client, used bytebuffers in some places, wrapped with ByteArrayBuilders. I have also done some profiling. We have some places in the client where we need to improve performance, but string manipulation is extremely low on the places that profile poorly. Closing this PR

@scottf scottf closed this Apr 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants