-
Notifications
You must be signed in to change notification settings - Fork 60
GelfEncoder.convertToNumberIfNeeded Performance and exception as control-flow #108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for bringing this up! About the history of this:
As you pointed out, the performance penalty unfortunately is quite heavy. Maybe you want to analyze the performance impact of the following? private Object convertToNumberIfNeeded(final Object value) {
if (numbersAsString || !(value instanceof String)) {
return value;
}
// Simple check if the string could be a number to avoid the performance overhead of exception handling
final char[] ca = ((String) value).toCharArray();
for (char c : ca) {
if (!isBigDecimalChar(c)) {
return value;
}
}
try {
return new BigDecimal(ca, 0, ca.length);
} catch (final NumberFormatException e) {
return value;
}
}
private static boolean isBigDecimalChar(final char c) {
return (c >= '0' && c <= '9')
|| c == '.'
|| c == '+' || c == '-'
|| c == 'E' || c == 'e';
} Excluding fields like |
Very rough initial test: Added a @Test
void millionEncodeLoop() {
encoder.start();
final LoggerContext lc = (LoggerContext) LoggerFactory.getILoggerFactory();
final Logger logger = lc.getLogger(LOGGER_NAME);
long startTime = System.nanoTime();
for (int i = 0; i < 1_000_000; i++) {
assertThatJson(encodeToStr(simpleLoggingEvent(logger, null)))
.node("full_message").isEqualTo("message 1\\n");
}
System.out.printf("Time for 1M encodes: %,d ns.\n", System.nanoTime() - startTime);
} Added both version with/without numbersAsString=true, ran about 8 times on JDK 21, minimal variation between runs. Before changes:
after changes exactly as you described:
So roughly from more than 2X slower to only 10-13% slower. Note: Test time includes the JSON assert time on output but that should be constant between code versions. (And avoids JVM optimizing away any operations since output is fully consumed) Then I thought about other factors: stack depth and number of fields. Stack depth in test runner due to Gradle and JUnit was already 86 elements deep so I don't think there's a need to simulate a deeper stack. Number of fields is significant since all are processed, so I made another test where I just A bit later I'll run and post results also on JDK11 with all combinations:
|
It just came to my mind, that maybe it's sufficient (and more efficient) to only check if the first character is a digit. private Object convertToNumberIfNeeded(final Object value) {
if (numbersAsString || !(value instanceof String)
|| !isBigDecimalChar(((String) value).charAt(0))) {
return value;
}
try {
return new BigDecimal((String) value);
} catch (final NumberFormatException e) {
return value;
}
}
private static boolean isBigDecimalChar(final char c) {
return (c >= '0' && c <= '9')
|| c == '.'
|| c == '+' || c == '-';
} I also did a quick test, and it seems to further improve the performance. But that, of course, highly depends on the values passed to the method... |
Regarding just checking first character: Maybe more efficient but also more unstable? As you said clearly depends on the data, see I can also imagine very common log values that will trick this and go to the very slow path of exception:
Plus the first version of the code seems more clear since it's checking the full value. |
Finished test runs and attached results below including .patch with tests I used for benchmarking Sample measurements: Before:
After (with your original code proposal):
Summary of approximate times in seconds (there's some variation so picked random samples, full times attached):
All cases are faster. What's funny is '1M encodes with 10 key-value pairs' is faster without numbersAsString=true now. But it's a very minor variation and maybe it's just better pre-fetching since we're always reading the string characters fully.... GelfEncoderTest_additional.patch |
Thanks for analyzing! I will release 6.0.2 shortly. |
Describe the bug
GelfEncoder.convertToNumberIfNeeded
attempts conversion to a number usingnew BigDecimal
constructor and control-flow is basically handled via exception.As far as I can see this is attempted on all fields including standard fields like 'logger_name', 'thread_name', 'level_name' so basically guaranteeing at least 3 exceptions for any kind of log message even without custom/additional fields.
This can add up to a lot given that stack trace population for the exception can be quite expensive.
Below an async-profiler partial flame graph, environment:
Logger.info
convertToNumberIfNeeded
just overwhelms everything compared to file appender or actual compression and UDP sending.Thoughts:
BigInteger
for conversion if field value is definitely not a number+-0123456789.eE
(and char array BigInteger constructor could be used then if iterating)logger_name
could be excluded?numbersAsString
configuration option and its default false setting?I can probably attempt a PR for the first numeric check sometime in the next couple of days.
Additional context
java -version
)The text was updated successfully, but these errors were encountered: