GelfEncoder.convertToNumberIfNeeded Performance and exception as control-flow #108

deathy · 2024-08-22T20:48:17Z

Describe the bug

GelfEncoder.convertToNumberIfNeeded attempts conversion to a number using new BigDecimal constructor and control-flow is basically handled via exception.

As far as I can see this is attempted on all fields including standard fields like 'logger_name', 'thread_name', 'level_name' so basically guaranteeing at least 3 exceptions for any kind of log message even without custom/additional fields.

This can add up to a lot given that stack trace population for the exception can be quite expensive.

Below an async-profiler partial flame graph, environment:

logback FILE appender and GELF UDP appender configured for root appender
logging multiple INFO messages
flame graph only partial, rooted at one instance of Logger.info

convertToNumberIfNeeded just overwhelms everything compared to file appender or actual compression and UDP sending.

Thoughts:

should not call BigInteger for conversion if field value is definitely not a number
- simple character iteration and fail on anything outside +-0123456789.eE (and char array BigInteger constructor could be used then if iterating)
maybe some obvious fields like logger_name could be excluded?
any history on numbersAsString configuration option and its default false setting?
- I guess it may be hard to change the default now but maybe possible performance impact could be documented? Not it's not completely clear what it does (that it tries for all fields)

I can probably attempt a PR for the first numeric check sometime in the next couple of days.

Additional context

Java distribution and version in use (output of java -version)
- OpenJDK Runtime Environment Temurin-21.0.2+13
SLF4J and Logback version in use
- slf4j-api 2.0.13, logback 1.5.6
Graylog version in use
- Graylog 6.0.3

The text was updated successfully, but these errors were encountered:

osiegmar · 2024-08-23T09:07:02Z

Thanks for bringing this up!

About the history of this:

Number conversion was added in version 2.1.0 with log configured values as integer #30. To maintain backwards compatibility it was disabled by default.
To fix scientific notation (which is not supported by the GELF protocol) the change 18d3fa2 was made in version 3.0.0.
Also in version 3.0.0 the default for numbersAsString was changed from true to false as it seemed always desirable to have numbers as numbers in the search index to allow proper search capabilities.

As you pointed out, the performance penalty unfortunately is quite heavy. Maybe you want to analyze the performance impact of the following?

private Object convertToNumberIfNeeded(final Object value) {
    if (numbersAsString || !(value instanceof String)) {
        return value;
    }

    // Simple check if the string could be a number to avoid the performance overhead of exception handling
    final char[] ca = ((String) value).toCharArray();
    for (char c : ca) {
        if (!isBigDecimalChar(c)) {
            return value;
        }
    }

    try {
        return new BigDecimal(ca, 0, ca.length);
    } catch (final NumberFormatException e) {
        return value;
    }
}

private static boolean isBigDecimalChar(final char c) {
    return (c >= '0' && c <= '9')
           || c == '.'
           || c == '+' || c == '-'
           || c == 'E' || c == 'e';
}

Excluding fields like logger_name probably required a bit more refactoring. I'm not sure if it's worth the effort. Let's wait for the results of the performance analysis.

deathy · 2024-08-23T14:31:11Z

Very rough initial test:

Added a GelfEncoderTest test based on simple one:

@Test
void millionEncodeLoop() {
    encoder.start();
    final LoggerContext lc = (LoggerContext) LoggerFactory.getILoggerFactory();
    final Logger logger = lc.getLogger(LOGGER_NAME);

    long startTime = System.nanoTime();
    for (int i = 0; i < 1_000_000; i++) {
        assertThatJson(encodeToStr(simpleLoggingEvent(logger, null)))
                .node("full_message").isEqualTo("message 1\\n");
    }
    System.out.printf("Time for 1M encodes: %,d ns.\n", System.nanoTime() - startTime);
}

Added both version with/without numbersAsString=true, ran about 8 times on JDK 21, minimal variation between runs.

Before changes:

Time for 1M encodes: 20,478,114,200 ns.
Time for 1M encodes with numbersAsString=true: 8,702,857,900 ns.

Time for 1M encodes: 20,071,830,100 ns.
Time for 1M encodes with numbersAsString=true: 8,328,012,600 ns.

after changes exactly as you described:

Time for 1M encodes: 9,502,125,300 ns.
Time for 1M encodes with numbersAsString=true: 8,516,555,000 ns.

Time for 1M encodes: 9,480,593,100 ns.
Time for 1M encodes with numbersAsString=true: 8,577,866,600 ns.

So roughly from more than 2X slower to only 10-13% slower.

Note: Test time includes the JSON assert time on output but that should be constant between code versions. (And avoids JVM optimizing away any operations since output is fully consumed)

Then I thought about other factors: stack depth and number of fields.

Stack depth in test runner due to Gradle and JUnit was already 86 elements deep so I don't think there's a need to simulate a deeper stack.

Number of fields is significant since all are processed, so I made another test where I just event.addKeyValuePair(new KeyValuePair("field1","value1")); adding a bunch of key value fields.

A bit later I'll run and post results also on JDK11 with all combinations:

before/after
with/without numbersAsString
with/without additional field pairs

osiegmar · 2024-08-23T15:02:44Z

It just came to my mind, that maybe it's sufficient (and more efficient) to only check if the first character is a digit.

private Object convertToNumberIfNeeded(final Object value) {
    if (numbersAsString || !(value instanceof String)
        || !isBigDecimalChar(((String) value).charAt(0))) {
        return value;
    }

    try {
        return new BigDecimal((String) value);
    } catch (final NumberFormatException e) {
        return value;
    }
}

private static boolean isBigDecimalChar(final char c) {
    return (c >= '0' && c <= '9')
           || c == '.'
           || c == '+' || c == '-';
}

I also did a quick test, and it seems to further improve the performance. But that, of course, highly depends on the values passed to the method...

deathy · 2024-08-23T20:17:55Z

Regarding just checking first character:

Maybe more efficient but also more unstable? As you said clearly depends on the data, see value1 vs 1stvalue

I can also imagine very common log values that will trick this and go to the very slow path of exception:

IP addresses 10.0.0.1
- well this will be slow even in the first version, but I don't think reinventing a number parser would be wanted here
IP address and port pair 127.0.0.1:8080
- will not trigger exception in first code version, would trigger it with only checking first char
IPv6 address such as 2001:db8::8a2e:370:7334
- no exception in first case because of ":", but first character is a digit so very slow exception path with second code variant
certain kinds of UUIDs
- these will be all over the place depending on first char
- actually for any kind of well-distributed hash in hex you'd have 10/16 chance of first character being a digit but then going the very slow path through exception

Plus the first version of the code seems more clear since it's checking the full value.

deathy · 2024-08-23T20:30:32Z

Finished test runs and attached results below including .patch with tests I used for benchmarking

Sample measurements:

Before:

Time for 1M encodes: 19,000,709,000 ns.
Time for 1M encodes with numbersAsString=true: 8,218,149,800 ns.
Time for 1M encodes with 5 key-value pairs: 46,252,780,700 ns.
Time for 1M encodes with 5 key-value pairs and numbersAsString=true: 9,615,291,900 ns.
Time for 1M encodes with 10 key-value pairs: 72,548,223,100 ns.
Time for 1M encodes with 10 key-value pairs and numbersAsString=true: 12,112,090,100 ns.


Time for 1M encodes: 18,903,051,500 ns.
Time for 1M encodes with numbersAsString=true: 8,555,006,800 ns.
Time for 1M encodes with 5 key-value pairs: 46,891,360,500 ns.
Time for 1M encodes with 5 key-value pairs and numbersAsString=true: 9,971,852,000 ns.
Time for 1M encodes with 10 key-value pairs: 72,963,828,400 ns.
Time for 1M encodes with 10 key-value pairs and numbersAsString=true: 12,080,028,700 ns.

After (with your original code proposal):

Time for 1M encodes: 8,306,923,000 ns.
Time for 1M encodes with numbersAsString=true: 8,230,593,600 ns.
Time for 1M encodes with 5 key-value pairs: 9,995,377,600 ns.
Time for 1M encodes with 5 key-value pairs and numbersAsString=true: 9,614,381,200 ns.
Time for 1M encodes with 10 key-value pairs: 11,114,532,800 ns.
Time for 1M encodes with 10 key-value pairs and numbersAsString=true: 12,274,562,900 ns.


Time for 1M encodes: 8,281,705,100 ns.
Time for 1M encodes with numbersAsString=true: 8,373,099,100 ns.
Time for 1M encodes with 5 key-value pairs: 10,040,063,400 ns.
Time for 1M encodes with 5 key-value pairs and numbersAsString=true: 9,837,812,200 ns.
Time for 1M encodes with 10 key-value pairs: 11,176,946,700 ns.
Time for 1M encodes with 10 key-value pairs and numbersAsString=true: 12,162,474,000 ns.

Summary of approximate times in seconds (there's some variation so picked random samples, full times attached):

Test	Before	After
1M encodes	19 s	8.2 s
1M encodes with numbersAsString=true	8.2 s	8.2 s
1M encodes with 5 key-value pairs	46.3 s	9.9 s
1M encodes with 5 key-value pairs and numbersAsString=true	9.5 s	9.5 s
1M encodes with 10 key-value pairs	72.6 s	11.1 s
1M encodes with 10 key-value pairs and numbersAsString=true	12.1 s	12.1 s

All cases are faster.
You can also see impact of number of fields.

What's funny is '1M encodes with 10 key-value pairs' is faster without numbersAsString=true now. But it's a very minor variation and maybe it's just better pre-fetching since we're always reading the string characters fully....

GelfEncoderTest_additional.patch
GelfEncoderTest_measurements_after.txt
GelfEncoderTest_measurements_before.txt

osiegmar · 2024-08-24T11:13:12Z

Thanks for analyzing! I will release 6.0.2 shortly.

deathy · 2024-08-24T11:36:01Z

Thanks!

Last note, minor amusement, I profiled also my test and it turns out ~30% of time is spent in JsonUnit assert because it also does check for number via exception using Double.parseDouble ... (meanwhile convertToNumberIfNeeded is almost invisible, 0.16% )

osiegmar added a commit that referenced this issue Aug 24, 2024

Improve performance of number conversion in GelfEncoder #108

1d861cf

osiegmar closed this as completed Aug 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GelfEncoder.convertToNumberIfNeeded Performance and exception as control-flow #108

GelfEncoder.convertToNumberIfNeeded Performance and exception as control-flow #108

deathy commented Aug 22, 2024

osiegmar commented Aug 23, 2024

deathy commented Aug 23, 2024

osiegmar commented Aug 23, 2024 •

edited

Loading

deathy commented Aug 23, 2024

deathy commented Aug 23, 2024

osiegmar commented Aug 24, 2024

deathy commented Aug 24, 2024

GelfEncoder.convertToNumberIfNeeded Performance and exception as control-flow #108

GelfEncoder.convertToNumberIfNeeded Performance and exception as control-flow #108

Comments

deathy commented Aug 22, 2024

osiegmar commented Aug 23, 2024

deathy commented Aug 23, 2024

osiegmar commented Aug 23, 2024 • edited Loading

deathy commented Aug 23, 2024

deathy commented Aug 23, 2024

osiegmar commented Aug 24, 2024

deathy commented Aug 24, 2024

osiegmar commented Aug 23, 2024 •

edited

Loading