Skip to content

Don't cache sanitization results for large sql statements #13353

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 10, 2025

Conversation

laurit
Copy link
Contributor

@laurit laurit commented Feb 19, 2025

Hopefully resolves #13180
Since we keep the statement as key in the sanitization cache large statements can cause the cache to grow to several hundred mb in size. This PR disables caching for statements larger than 10kb. There isn't any particular reason why 10kb was chosen so feel free to suggest a different size. Besides disabling the cache this PR introduces a thread local context for sharing computed values between span name extract and attribute extractor for sql client calls. This allows us to sanitize each statement only once and reuse the result between span name and attribute extraction.

@laurit laurit requested a review from a team as a code owner February 19, 2025 14:57
@@ -24,7 +24,9 @@ default String getDbSystem(REQUEST request) {

@Deprecated
@Nullable
String getUser(REQUEST request);
default String getUser(REQUEST request) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these are removed in the stable semconv we don't need to force users to implement them.

// sanitization result will not be cached for statements larger than the threshold to avoid
// cache growing too large
// https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/13180
if (statement.length() > LARGE_STATEMENT_THRESHOLD) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was thinking we could hash these larger statements instead of using the whole statement as the key, but that might be more computationally expensive, so this seems reasonable to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually my first attempt was to use hashing. Computing a hash for a very large statement can be more expensive than applying the sanitizer as the sanitizer also applies a size limit. My guess is that many of these super large statements could be dynamically generated so it is likely that the statement is executed only once and would not benefit from caching anyway.

@laurit laurit added this to the v2.14.0 milestone Mar 7, 2025
@@ -164,6 +165,10 @@ Context startAndEnd(
}

private Context doStart(Context parentContext, REQUEST request, @Nullable Instant startTime) {
return InstrumenterContext.withContext(() -> doStartImpl(parentContext, request, startTime));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the delayed feedback, I wish this (relatively small) overhead didn't affect all instrumentations just for this edge case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reworked this to only create the thread local instrumentation context when needed. The downside is that now if these classes are not used with the Instrumenter there may be a leak.

@trask trask removed this from the v2.14.0 milestone Mar 13, 2025
@laurit laurit added this to the v2.15.0 milestone Apr 4, 2025
@trask trask merged commit 8cd11e4 into open-telemetry:main Apr 10, 2025
86 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DB statement sanitization causes memory leaks
3 participants