Skip to content

Allow skip cache factor to be updated dynamically #14412

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 30, 2025

Conversation

sgup432
Copy link
Contributor

@sgup432 sgup432 commented Mar 27, 2025

Description

Related issue - #14183

This change allows skip cache factor to be updated dynamically within LRU query cache. This can be done via getter/setter

@sgup432
Copy link
Contributor Author

sgup432 commented Mar 27, 2025

@jpountz Might need your review as discussed in #14183

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments, can you add a CHANGES entry too so that I can merge?

@sgup432
Copy link
Contributor Author

sgup432 commented Mar 29, 2025

@jpountz Added a CHANGES entry.

@kkewwei
Copy link
Contributor

kkewwei commented Mar 29, 2025

I am wondering if we should also make maxRamBytesUsed dynamic as well.

@jpountz jpountz merged commit eb37a44 into apache:main Mar 30, 2025
7 checks passed
@jpountz
Copy link
Contributor

jpountz commented Mar 30, 2025

I am wondering if we should also make maxRamBytesUsed dynamic as well.

What is your use-case for tuning it? Related to my comment on the issue linked to this PR, I worry a bit about optimizing too much for query caching vs. making the query cache less relevant.

To expand a bit more on this, for "good" queries, like term queries or FieldExistQuery that produce good iterators with a low up-front cost, memory would be better spent on the filesystem cache than on the query cache. Boolean queries recently had big improvements that makes the query cache less relevant. (annotations HS, HW and HX at https://benchmarks.mikemccandless.com/CountAndHighHigh.html and https://benchmarks.mikemccandless.com/CountOrHighHigh.html). So that makes the query cache mostly relevant for queries that have a high up-front cost, like point queries and multi-term queries. Rather than investing in the query cache for these queries, I would rather invest in making them behave better, e.g. recent work on vectorizing this up-front cost for PointRangeQuery so that it's less likely to be the bottleneck of query evaluation, or IndexOrDocValuesQuery to skip this up-front cost entirely when it's hurting more than helping.

@kkewwei
Copy link
Contributor

kkewwei commented Mar 31, 2025

Thank you for your reply. I understand your perspective. You aim to optimize execution efficiency rather than focus on cache optimization.

In certain rare scenarios, the querycache memory becomes full. To rectify the querycache situation, we are currently compelled to restart the JVM, which differs from your line of thought.

I'm wondering if it would be feasible to expose all querycache cache APIs and eliminate any cache strategies within Lucene itself. If Lucene already has a querycache, then upper - level services like Elasticsearch or OpenSearch could refrain from implementing their own querycache.

Please ignore this suggestion if it doesn't seem viable.

@jpountz
Copy link
Contributor

jpountz commented Mar 31, 2025

Sorry I'm not sure I get your suggestion.

@sgup432
Copy link
Contributor Author

sgup432 commented Mar 31, 2025

@jpountz Thanks for expanding on the reasoning above w.r.t. query cache usage. I was working on refactoring query cache so that it is not a bottleneck itself when utilized well. Maybe we can an additional caching policy wrt to determine high upfront cost or query latency, so that we only cache queries which will give us the benefit. Let me know if its still worth a shot, meanwhile we can work on optimizing the queries itself. I can open up a PR soon for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants