Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix panic when using cassandra in multiple periodic configs or as a store for both index and delete requests #2774

Merged

Conversation

sandeepsukhani
Copy link
Contributor

@sandeepsukhani sandeepsukhani commented Jun 24, 2020

What this PR does:
We create Cassandra clients for object store, index store and table manager. We sometimes need multiple instances of each of them at different places based on how Cortex is configured. While creating a single instance for each type of usage is not a problem, creating multiple of same types of client instances causes the code to panic due to duplicate metric registration. See #2772. The issue can be reproduced easily by using Cassandra as index store with multiple periodic schemas.

This code adds a new label called purpose which gets sets based on the purpose of creating the client.
When Cassandra is used in multiple periodic config, purpose is set to start date of that periodic config.

Note: Using the same date for multiple periodic configs with Cassandra can still cause the code to panic since it would try to register the metric with the same label values. Considering the chances of someone having a need for such config is negligible we should be good here.

Which issue(s) this PR fixes:
Fixes #2772

Checklist

  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@sandeepsukhani sandeepsukhani force-pushed the cassandra-clients-singleton branch from 7093efa to 9d6a2d4 Compare June 24, 2020 11:05
@pull-request-size pull-request-size bot added size/L and removed size/M labels Jun 24, 2020
@sandeepsukhani sandeepsukhani force-pushed the cassandra-clients-singleton branch from 9d6a2d4 to 0f25693 Compare June 24, 2020 12:05
@tomwilkie tomwilkie self-requested a review June 24, 2020 12:30
@tomwilkie
Copy link
Contributor

Globals are bad (especially like this - I can forgive them for metrics).

Can we have the name of the schema (perhaps based on its start and end date) passed to the client so it can name it metrics appropriately and prevent duplicates? I should be enough to add this as an extra label in the registererer.

@tomwilkie
Copy link
Contributor

Also, lets add a test that reproduces this issue first (ie starts Cortex with two Cassandra schemas)

@sandeepsukhani sandeepsukhani force-pushed the cassandra-clients-singleton branch from 0f25693 to 6614f0e Compare June 24, 2020 13:15
@pull-request-size pull-request-size bot added size/M and removed size/L labels Jun 24, 2020

var (
sharedSessions map[string]*sharedSession
sessionsMtx sync.Mutex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are still globals. If we give easy session its own name, we won't get registrations error, and we won't need globals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I saw your review after pushing the code. I thought we didn't want to create multiple sessions per schema config where Cassandra is use so I went this way which I accept is ugly.
I didn't get a chance to work on this yet. Will fix this.

Copy link
Contributor

@tomwilkie tomwilkie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can solve this problem without globals.

@sandeepsukhani
Copy link
Contributor Author

@jtlisi Thanks! Will check it out.

@sandeepsukhani sandeepsukhani force-pushed the cassandra-clients-singleton branch from 6614f0e to 925d0ec Compare June 28, 2020 12:17
@pull-request-size pull-request-size bot added size/L and removed size/M labels Jun 28, 2020
@sandeepsukhani sandeepsukhani changed the title make all clients for cassandra singleton fix panic when using cassandra in multiple periodic configs or as a store for both index and delete requests Jun 28, 2020
@sandeepsukhani
Copy link
Contributor Author

Globals are bad (especially like this - I can forgive them for metrics).

Can we have the name of the schema (perhaps based on its start and end date) passed to the client so it can name it metrics appropriately and prevent duplicates? I should be enough to add this as an extra label in the registererer.

@tomwilkie I have just added the start date of the schema to the label to keep things simple.
Also please note that when using the same date for multiple periodic configs with Cassandra can still cause the code to panic since it would try to register the metric with the same label values. Considering the chances of someone having a need for such config is negligible we should be good here. We can always add an end date as well to the label if someone faces an issue.
Please let me know if you think we should have it now.

@@ -89,7 +89,7 @@ func (cfg *Config) Validate() error {
return nil
}

func (cfg *Config) session(name string) (*gocql.Session, error) {
func (cfg *Config) session(name, purpose string, registerer prometheus.Registerer) (*gocql.Session, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets not add purpose here, but rather pass in prometheus.DefaultRegisterer(prometheus.Labels{"schema_start": ...}) to NewStorageClient

@@ -180,7 +180,7 @@ func NewStore(cfg Config, storeCfg chunk.StoreConfig, schemaCfg chunk.SchemaConf
}

// NewIndexClient makes a new index client of the desired type.
func NewIndexClient(name string, cfg Config, schemaCfg chunk.SchemaConfig) (chunk.IndexClient, error) {
func NewIndexClient(name string, cfg Config, schemaCfg chunk.SchemaConfig, purpose string, registerer prometheus.Registerer) (chunk.IndexClient, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets not pass a string here but the DayTime. And lets not pass the registerer here, but wrap a new registerer next to the instantiation of the Cassandra client. Otherwise I feel it sets expectation that the registerer will be used for all client.s

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is NewIndexClient is also used to create IndexClient for storing delete requests here, which is not tied to start date.
I added the purpose label to track the sessions based on the purpose of usage which could be for storing series index(using Daytime for this since there are multiple schemas) or for indexing delete requests or any other use case that we may have in future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case leave the registerer plumbed through but don't plumb reason though - just wrap the registerer before each instantiation of NewIndexClient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can create the registerer with the purpose or something label for all the clients if you think that is more appropriate and clearer. I think there is no harm in having an extra-label.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind having the extra label (its needed) but there is no point in plumbing it through like that. We should add it close to where it semantically makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomwilkie I have done these changes and also made changes to use the same registerer for metrics exported by dynamodb. Thanks for the review!

Copy link
Contributor

@tomwilkie tomwilkie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much better, thank you Sandeep. A few nits - lets not call the label purpose, its pretty meaningless. Call it "schema_start_date" or something.

@sandeepsukhani sandeepsukhani force-pushed the cassandra-clients-singleton branch 2 times, most recently from dc69eb9 to 51ac653 Compare July 1, 2020 08:02
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sandeepsukhani ! I appreciate the refactoring you did. I left few comments, minor things, but I would be glad if you could take a look. Thanks!

Also, is there any existing unit test we can enrich passing the registerer and asserting on the collected metrics?

@sandeepsukhani
Copy link
Contributor Author

Thanks @sandeepsukhani ! I appreciate the refactoring you did. I left few comments, minor things, but I would be glad if you could take a look. Thanks!

Thanks for the detailed feedback @pracucci!

Also, is there any existing unit test we can enrich passing the registerer and asserting on the collected metrics?

Unfortunately, I don't think we have any such test. None of the index clients was accepting a registerer before.
Also, different index clients have different metrics(if any) so we would only have tests specific to the index clients.

@sandeepsukhani sandeepsukhani force-pushed the cassandra-clients-singleton branch from 3609819 to 7c0f7a2 Compare July 8, 2020 07:48
@sandeepsukhani sandeepsukhani force-pushed the cassandra-clients-singleton branch from 7c0f7a2 to 8a3e8ac Compare July 8, 2020 07:50
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sandeepsukhani for addressing my feedback, LGTM (modulo a couple of final nits)!

@pracucci pracucci requested a review from tomwilkie July 8, 2020 15:58
Copy link
Contributor

@jtlisi jtlisi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

However, I'm a bit concerned if NewTableClient or NewIndexClient is called without a wrapped registerer. Could that lead to a registration error since the labels for a metric vary? We recently had this problem in the KV client #2807. If that is the case it might make sense to wrap the registerer within the functions themselves and add a name as a function parameter.

@pracucci pracucci requested review from tomwilkie and removed request for tomwilkie July 9, 2020 15:30
@pracucci pracucci dismissed tomwilkie’s stale review July 23, 2020 16:40

Sandeep has changed the PR after your initial review and I think the current state is good to go

@pracucci
Copy link
Contributor

@sandeepsukhani Sorry for the veeeeeery long wait. I just learned how to dismiss a review and unblock the merge. Thanks for your work here and... 🚀 let's go!

@pracucci pracucci merged commit a20a386 into cortexproject:master Jul 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Panics when using Purger with Cassandra
4 participants