Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix issue where we are not returning all the available ingesters in the ring #3977

Merged
merged 3 commits into from
Mar 23, 2021

Conversation

roystchiang
Copy link
Contributor

@roystchiang roystchiang commented Mar 19, 2021

What this PR does:

Fix an issue where extended write does not return all the available instances.

If we checked ingester-1 that's in az-A, and it's not active. We'll add az-A to distinct zones in the subsequent loops, we'll never check ingesters in az-A again.

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@alvinlin123
Copy link
Contributor

alvinlin123 commented Mar 19, 2021

With this change, we should see less CPU usage during deployment too, as you don't have to loop as many times when 1 ingester went unhealthy when distributor.extended-writte = true

@roystchiang roystchiang force-pushed the fix-ring-issue branch 2 times, most recently from 70294d9 to 843d4f4 Compare March 19, 2021 04:11
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for spotting this issue! It's definitely an issue but I'm unsure about the fix. Please take a look at my comment.

pkg/ring/ring.go Outdated
// this instance.
shouldExtendReplicaSet := op.ShouldExtendReplicaSetOnState(instance.State)
if shouldExtendReplicaSet {
n++
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it should be extended here. Reason is that at this point you still don't know if the instance belongs to a zone which has already been included and so the following check on distinctZones will discard it.

Keeping the logic in master, I a fix could be just moving distinctZones = append(distinctZones, info.Zone) to an else branch of the check on ShouldExtendReplicaSetOnState():

		// Check whether the replica set should be extended given we're including
		// this instance.
		if op.ShouldExtendReplicaSetOnState(instance.State) {
			n++
		} else if r.cfg.ZoneAwarenessEnabled && info.Zone != "" {
			// TODO add a comment here to explain the rationale
			distinctZones = append(distinctZones, info.Zone)
		}

Comment on lines 140 to 145
"instance-1": {Addr: "127.0.0.1", Zone: "zone-a", Tokens: GenerateTokens(128, nil), State: ACTIVE},
"instance-2": {Addr: "127.0.0.2", Zone: "zone-a", Tokens: GenerateTokens(128, nil), State: ACTIVE},
"instance-3": {Addr: "127.0.0.3", Zone: "zone-b", Tokens: GenerateTokens(128, nil), State: ACTIVE},
"instance-4": {Addr: "127.0.0.4", Zone: "zone-b", Tokens: GenerateTokens(128, nil), State: ACTIVE},
"instance-5": {Addr: "127.0.0.5", Zone: "zone-c", Tokens: GenerateTokens(128, nil), State: LEAVING},
"instance-6": {Addr: "127.0.0.6", Zone: "zone-c", Tokens: GenerateTokens(128, nil), State: ACTIVE},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you're generating Tokens() but then they're overwritten below in the for loop. You can remove them from here.

Comment on lines 186 to 187
assert.Equal(t, len(set.Instances), 3)
assert.Equal(t, len(distinctZones), 3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Ordering of arguments is (t, expected, actual), so they should be flipped. However, to check length you can use assert.Len().

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@pracucci
Copy link
Contributor

@roystchiang Please make sure to sign your commits (see here) otherwise we can't merge.

Co-authored-by: Marco Pracucci <[email protected]>
Signed-off-by: Roy Chiang <[email protected]>
@roystchiang
Copy link
Contributor Author

@roystchiang Please make sure to sign your commits (see here) otherwise we can't merge.

oops. Sorry. I will sign.

Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you.

@pracucci pracucci merged commit d45e65a into cortexproject:master Mar 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants