-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Introducing MergedSegmentWarmerFactory to support the extension of IndexWriter.IndexReaderWarmer #17881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introducing MergedSegmentWarmerFactory to support the extension of IndexWriter.IndexReaderWarmer #17881
Conversation
Signed-off-by: guojialiang <[email protected]>
…pre copy Signed-off-by: guojialiang <[email protected]>
❌ Gradle check result for e012362: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
The replication lag improvement looks promising. There appears to be up to 5-6% higher CPU utilization with pre-copy. Is this reproducible each time, or can this be considered as general variability between multiple runs? |
Hi, @ashking94
Yeah, the improvementt is very obvious.
Some explanation is needed here. Because the current value fluctuates, we generally care about the max and avg CPU utilization. Taking node At the same time, we repeated the test of enabling pre-copy. It can be seen that the average CPU utilization of In summary, merged segments pre copy hardly introduces additional CPU overhead. |
server/src/main/java/org/opensearch/index/engine/InternalEngine.java
Outdated
Show resolved
Hide resolved
Signed-off-by: guojialiang <[email protected]>
97fa12b
to
a6157f0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, do address the unresolved comments if any.
Thanks, @ashking94
All comments have been resolved, looking forward to merging the code. |
@guojialiang92 can you resolve the conflicts? |
…gment_warmer # Conflicts: # CHANGELOG.md # server/src/main/java/org/opensearch/common/util/FeatureFlags.java
@ashking94, It has been resolved. |
…rWarmer (opensearch-project#17881) * support merged segment warmer Signed-off-by: guojialiang <[email protected]> * Introduce MergedSegmentWarmerFactory for Local/Remote merged segment pre copy Signed-off-by: guojialiang <[email protected]> * add FeatureFlags.MERGED_SEGMENT_WARMER_EXPERIMENTAL_FLAG Signed-off-by: guojialiang <[email protected]> * add an entry in CHANGELOG Signed-off-by: guojialiang <[email protected]> * add validation Signed-off-by: guojialiang <[email protected]> * modify changelog entry Signed-off-by: guojialiang <[email protected]> * add comment Signed-off-by: guojialiang <[email protected]> * add comment Signed-off-by: guojialiang <[email protected]> --------- Signed-off-by: guojialiang <[email protected]> Signed-off-by: Harsh Kothari <[email protected]>
…rWarmer (opensearch-project#17881) * support merged segment warmer Signed-off-by: guojialiang <[email protected]> * Introduce MergedSegmentWarmerFactory for Local/Remote merged segment pre copy Signed-off-by: guojialiang <[email protected]> * add FeatureFlags.MERGED_SEGMENT_WARMER_EXPERIMENTAL_FLAG Signed-off-by: guojialiang <[email protected]> * add an entry in CHANGELOG Signed-off-by: guojialiang <[email protected]> * add validation Signed-off-by: guojialiang <[email protected]> * modify changelog entry Signed-off-by: guojialiang <[email protected]> * add comment Signed-off-by: guojialiang <[email protected]> * add comment Signed-off-by: guojialiang <[email protected]> --------- Signed-off-by: guojialiang <[email protected]> Signed-off-by: Harsh Kothari <[email protected]>
* | ||
* @opensearch.internal | ||
*/ | ||
public class LocalMergedSegmentWarmer implements IndexWriter.IndexReaderWarmer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would have preferred this in a follow-up PR :)
*/ | ||
|
||
/* | ||
* Licensed to Elasticsearch under one or more contributor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very late to this pr, but we need to remove the es part of these licenses.
Description
In order to reduce the visibility delay of replicas in Segment Replication, we propose the idea of merged segment pre-copy in 17528. After discussion, we believe that it can produce good results in both local segment replication and remote store situations.
This PR is the beginning of supporting merged segments pre-copy. We introduce
MergedSegmentWarmerFactory
for Local/Remote merged segment pre copy.Related Issues
Resolves #[17528]
Experiment
Test1: Replica Lag Time
We enable pre-copy in local on-disk segment replication to test the optimization effect.
We used two 16C64GB nodes,
nyc_taxis
workload,2
primary shard and1
replica shard for testing. The test command is as follows:We use command
_cat/segment_replication
to collect indicatorcurrent_lag
every5
seconds and compare the replica lag time when closing and opening pre-copy feature.Result1: segment replication without pre-copy
Result2: segment replication with pre-copy
The X-axis represents logical time, collected every
5
seconds. The Y-axis represents the indicatorcurrent_lag
collected through command_cat/segment_replication
. We can see that the maximum lag time has been reduced from66s
to2.9s
, with more than20
times optimization.Test2: CPU util and P99 Read Latency
At the same time, we conducted a write-search test and focused on CPU util and P99 Read Latency. We can see that there is no significant change in query latency and CPU util.
Result1: segment replication without pre-copy
Result2: segment replication with pre-copy
The analysis of the reasons is as follows. CPU-intensive tasks are performed before pre-copy. There will be no significant CPU overhead due to the introduction of pre-copy. At the same time, the time cost of merging in phase
SegmentMerger#merge
should be much greater than the time spent in phaseIndexReaderWarmer#warm
. The additional synchronization time introduced by pre-copy should not account for a high proportion in the entire merge process.Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.