-
Notifications
You must be signed in to change notification settings - Fork 814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix panic when using cassandra in multiple periodic configs or as a store for both index and delete requests #2774
fix panic when using cassandra in multiple periodic configs or as a store for both index and delete requests #2774
Conversation
7093efa
to
9d6a2d4
Compare
9d6a2d4
to
0f25693
Compare
Globals are bad (especially like this - I can forgive them for metrics). Can we have the name of the schema (perhaps based on its start and end date) passed to the client so it can name it metrics appropriately and prevent duplicates? I should be enough to add this as an extra label in the registererer. |
Also, lets add a test that reproduces this issue first (ie starts Cortex with two Cassandra schemas) |
0f25693
to
6614f0e
Compare
pkg/chunk/cassandra/sessions.go
Outdated
|
||
var ( | ||
sharedSessions map[string]*sharedSession | ||
sessionsMtx sync.Mutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are still globals. If we give easy session its own name, we won't get registrations error, and we won't need globals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I saw your review after pushing the code. I thought we didn't want to create multiple sessions per schema config where Cassandra is use so I went this way which I accept is ugly.
I didn't get a chance to work on this yet. Will fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can solve this problem without globals.
@jtlisi Thanks! Will check it out. |
6614f0e
to
925d0ec
Compare
@tomwilkie I have just added the start date of the schema to the label to keep things simple. |
@@ -89,7 +89,7 @@ func (cfg *Config) Validate() error { | |||
return nil | |||
} | |||
|
|||
func (cfg *Config) session(name string) (*gocql.Session, error) { | |||
func (cfg *Config) session(name, purpose string, registerer prometheus.Registerer) (*gocql.Session, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets not add purpose here, but rather pass in prometheus.DefaultRegisterer(prometheus.Labels{"schema_start": ...})
to NewStorageClient
pkg/chunk/storage/factory.go
Outdated
@@ -180,7 +180,7 @@ func NewStore(cfg Config, storeCfg chunk.StoreConfig, schemaCfg chunk.SchemaConf | |||
} | |||
|
|||
// NewIndexClient makes a new index client of the desired type. | |||
func NewIndexClient(name string, cfg Config, schemaCfg chunk.SchemaConfig) (chunk.IndexClient, error) { | |||
func NewIndexClient(name string, cfg Config, schemaCfg chunk.SchemaConfig, purpose string, registerer prometheus.Registerer) (chunk.IndexClient, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets not pass a string here but the DayTime. And lets not pass the registerer here, but wrap a new registerer next to the instantiation of the Cassandra client. Otherwise I feel it sets expectation that the registerer will be used for all client.s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is NewIndexClient
is also used to create IndexClient
for storing delete requests here, which is not tied to start date.
I added the purpose
label to track the sessions based on the purpose of usage which could be for storing series index(using Daytime for this since there are multiple schemas) or for indexing delete requests or any other use case that we may have in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case leave the registerer plumbed through but don't plumb reason though - just wrap the registerer before each instantiation of NewIndexClient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can create the registerer with the purpose
or something label for all the clients if you think that is more appropriate and clearer. I think there is no harm in having an extra-label.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind having the extra label (its needed) but there is no point in plumbing it through like that. We should add it close to where it semantically makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomwilkie I have done these changes and also made changes to use the same registerer for metrics exported by dynamodb. Thanks for the review!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks much better, thank you Sandeep. A few nits - lets not call the label purpose, its pretty meaningless. Call it "schema_start_date" or something.
dc69eb9
to
51ac653
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sandeepsukhani ! I appreciate the refactoring you did. I left few comments, minor things, but I would be glad if you could take a look. Thanks!
Also, is there any existing unit test we can enrich passing the registerer and asserting on the collected metrics?
Thanks for the detailed feedback @pracucci!
Unfortunately, I don't think we have any such test. None of the index clients was accepting a registerer before. |
…and relevant tests Signed-off-by: Sandeep Sukhani <[email protected]>
3609819
to
7c0f7a2
Compare
Signed-off-by: Sandeep Sukhani <[email protected]>
…ions Signed-off-by: Sandeep Sukhani <[email protected]>
Signed-off-by: Sandeep Sukhani <[email protected]>
Signed-off-by: Sandeep Sukhani <[email protected]>
7c0f7a2
to
8a3e8ac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sandeepsukhani for addressing my feedback, LGTM (modulo a couple of final nits)!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
However, I'm a bit concerned if NewTableClient
or NewIndexClient
is called without a wrapped registerer. Could that lead to a registration error since the labels for a metric vary? We recently had this problem in the KV client #2807. If that is the case it might make sense to wrap the registerer within the functions themselves and add a name as a function parameter.
Signed-off-by: Sandeep Sukhani <[email protected]>
Signed-off-by: Sandeep Sukhani <[email protected]>
Sandeep has changed the PR after your initial review and I think the current state is good to go
@sandeepsukhani Sorry for the veeeeeery long wait. I just learned how to dismiss a review and unblock the merge. Thanks for your work here and... 🚀 let's go! |
What this PR does:
We create Cassandra clients for object store, index store and table manager. We sometimes need multiple instances of each of them at different places based on how Cortex is configured. While creating a single instance for each type of usage is not a problem, creating multiple of same types of client instances causes the code to panic due to duplicate metric registration. See #2772. The issue can be reproduced easily by using Cassandra as index store with multiple periodic schemas.
This code adds a new label called
purpose
which gets sets based on the purpose of creating the client.When Cassandra is used in multiple periodic config,
purpose
is set to start date of that periodic config.Note: Using the same date for multiple periodic configs with Cassandra can still cause the code to panic since it would try to register the metric with the same label values. Considering the chances of someone having a need for such config is negligible we should be good here.
Which issue(s) this PR fixes:
Fixes #2772
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]