-
Notifications
You must be signed in to change notification settings - Fork 251
[YUNIKORN-656] Add LDAP resolver for group resolution #1021
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
68948ab
to
9e4dd49
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1021 +/- ##
==========================================
+ Coverage 82.69% 83.10% +0.41%
==========================================
Files 98 100 +2
Lines 15682 16149 +467
==========================================
+ Hits 12968 13421 +453
- Misses 2439 2448 +9
- Partials 275 280 +5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good.
zap.Error(err)) | ||
continue | ||
} | ||
secretValue := strings.TrimSpace(string(secretValueBytes)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it ok to accept these values as is? Do we need to validate against defined template/format?
interval: cleanerInterval * time.Second, | ||
lookup: LdapLookupUser, | ||
lookupGroupID: LdapLookupGroupID, | ||
groupIds: LDAPLookupGroupIds, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to initialize stop
channel as well? We might to add test related based on cache clean up, interval set up etc for this new resolver
var unknownResolver = configs.UserGroupResolver{ | ||
Type: "unknown", | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see earlier comment. We need to run these tests even for this new resolver.
func LDAPLookupGroupIds(osUser *user.User) ([]string, error) { | ||
sr, err := LDAPConn_Bind(osUser.Username) | ||
if err != nil { | ||
return nil, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug log would help here.
l, err := ldap.DialURL(ldapaddr, | ||
ldap.DialWithTLSConfig(&tls.Config{InsecureSkipVerify: ldapConf.Insecure})) // #nosec G402 | ||
if err != nil { | ||
return nil, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug log would help here.
|
||
err = l.Bind(ldapConf.BindUser, ldapConf.BindPassword) | ||
if err != nil { | ||
return nil, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug log would help here.
} | ||
|
||
func GetUserGroupCacheLdap() *UserGroupCache { | ||
readSecrets() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we go forward if there are no secrets? Could happen because of LDAP secret mount setup issues. Since we use default values, warning log would help here with values as well because some values might be available or some not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mitdesai see question from Mani. I'd like to know the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With regards to configuration and secrets management, I'm pretty strongly against having everything in the secret file. At most username and password should be there; everything else ought to be handled via the normal configuration framework the rest of YuniKorn uses. Otherwise the LDAP resolver becomes this weird "add-on" that doesn't integrate well with the rest of YuniKorn. This would also solve the question Mani is asking -- if the feature is disabled, we skip loading the secrets entirely. If enabled, then we load secrets. If not present or empty we assume anonymous access (yes, people still do this). I think we should also use the K8s secret API directly rather than file access as we do elsewhere in the code. The secrets can then be watched via an Informer so that credentials can be hot reloaded (as should the rest of the config like we do elsewhere in YuniKorn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted! Let me check how we are handling the secrets elsewhere. Rest of the discussion about the secrets I had with @pbacsko is in a separate thread few comments below.
Failing on startup if secrets cannot be loaded is now part of the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for the informer + secrets API. Not sure why I haven't thought of it, but that seems to be the optimal solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@craigcondit @pbacsko : to read the secrets using the informers + secrets API, core will get dependent on k8s libraries. I believe, this is something we are trying to avoid. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The configuration is read in the shim and passed into the core. This will be no different. Updates will be needed to both.
var groups []string | ||
for _, entry := range sr.Entries { | ||
a := entry.GetAttributeValues("memberOf") | ||
println(a) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debug statement, not required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have a second round later on, now I'm adding comments based on my first impression.
Thanks for the review @pbacsko and @manirajv06. Valid concerns. I will update the PR today |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second round - gave some more comments.
|
||
// read secrets from the secrets directory | ||
// returns true if at least one secret was loaded and the configuration is valid, false otherwise | ||
var readSecrets = func() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just like in case of LdapAccess
, I propose an interface for this, eg. SecretReader
with two impementations: SecretReaderImpl
and SecretReaderMock
.
return secretCount > 0 && isValid && len(missingFields) == 0 | ||
} | ||
|
||
func GetUserGroupCacheLdap() *UserGroupCache { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should take two arguments: one for LdapAccess
and one for SecretReader
. That way, we eliminate global state which can affect unit tests. See below for details.
|
||
if !secretsLoaded { | ||
// Log a FATAL level message - this is very prominent and will typically cause the application to exit | ||
log.Log(log.Security).Fatal("LDAP configuration not found or invalid. No secrets were loaded from the secrets directory.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we truly need this to be fatal? I'm wondering if we have to be that strict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial implementation did not have it. But while addressing the review comments I received from you and Mani, I through that if LDAP is actually a requirement, and there is something wrong with the secrets, logging a warning and moving with defaults could result in situations which could be undesirable especially in prod like environments.
The resolver initialization happens in the very beginning of scheduler startup and we would never come back to read the secrets. So it is very easy to have that one warning message to get lost among all the other logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was thinking about the same thing, ie. undesired side-effects. I'm OK with leaving it in the code, but we should definitely document it.
Two questions:
- Is the Ldap mount always expected to exist? Are we always supposed to have this directory with files inside?
- By taking a deeper look at
readSecrets()
, it looks like we're reading configuration entries, not really secrets. Is this proper naming in the context of LDAP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW sorry for the slightly fragmented feedback, it takes some time to understand this PR, it's a slightly bigger one +3500 LOC :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem. I actually appreciate you taking interest in reviewing such PRs :-)
For your questions:
- Yes, because the the secret that I am expecting will be created like this:
kubectl -n yunikorn create secret generic ldap-secret \
--from-literal=Host=$host \
--from-literal=Port=$port \
--from-literal=BaseDN=$baseDN \
--from-literal=Filter=$filter \
--from-literal=GroupAttr=$groupAttr \
--from-literal=ReturnAttr=$returnAttr \
--from-literal=BindUser=$bindUser \
--from-literal=BindPassword=$bindPassword \
--from-literal=Insecure=$insecure \
--from-literal=SSL=$ssl \
--dry-run=client -o yaml | kubectl apply -f -
When ldap-secret is mounted, I get as many files as there are keys in the secret.
- It is called
readSecrets()
because we actually attempt to load and read the kubernetes secret. You are right that most of these are configurations. except theBindPassword
. In order to keep all the LDAP related settings together, I put them inside the ldap-secret.
Part of the reason is also that I had moved on from reading the ldap details from configmap to reading it from secret. At that time, it was a lift and shift effort in order to minimize code changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, got it. We definitely need to extend the documentation with this.
type ldapAccessFactory func(config *LdapResolverConfig) LdapAccess | ||
|
||
// defaultLdapAccessFactory is the default factory function that creates real LdapAccessImpl instances | ||
var defaultLdapAccessFactory ldapAccessFactory = func(config *LdapResolverConfig) LdapAccess { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment below, store the LdapAccess
as a variable in the struct, so such global variables are not needed.
// newLdapAccessImpl creates a new LdapAccess instance using the current factory | ||
// This can be replaced in tests to return mock implementations | ||
var newLdapAccessImpl = defaultLdapAccessFactory | ||
|
||
// resetLdapAccessFactory resets the factory to the default implementation | ||
// This is used in tests to ensure the global state is restored | ||
func resetLdapAccessFactory() { | ||
newLdapAccessImpl = defaultLdapAccessFactory | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be removed if we store LdapAccess
as struct member.
lookup: LdapLookupUser, | ||
lookupGroupID: LdapLookupGroupID, | ||
groupIds: LDAPLookupGroupIds, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These functions should be part of a type called LdapLookup
. So this
return &UserGroupCache{
ugs: map[string]*UserGroup{},
interval: cleanerInterval * time.Second,
lookup: LdapLookupUser,
lookupGroupID: LdapLookupGroupID,
groupIds: LDAPLookupGroupIds,
stop: make(chan struct{}),
}
becomes
ldapLookup: &LdapLookup{secretReader: secretReader, ldapAccess: ldapAccess}
return &UserGroupCache{
ugs: map[string]*UserGroup{},
interval: cleanerInterval * time.Second,
lookup: ldapLookup.LdapLookupUser,
lookupGroupID: ldapLookup.LdapLookupGroupID,
groupIds: ldapLookup.LDAPLookupGroupIds,
stop: make(chan struct{}),
}
Which is better because we can store both secretReader
and ldapAccess
inside ldapLookup
. That way, there's no need to always create an LdapAccessImpl
every time we call LDAPLookupGroupIds()
. Also makes unit testing completely stateless - there will be no need to setup/teardown stuff, defer calls, etc.
return &group, nil | ||
} | ||
|
||
func LDAPLookupGroupIds(osUser *user.User) ([]string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: LDAP vs Ldap, use consistent capitalization ("Ldap" is preferred)
BindUser string | ||
BindPassword string | ||
Insecure bool | ||
SSL bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "UseSSL" instead of "SSL" to make it less ambiguous
// Helper function to set up the mock LDAP implementation for testing | ||
func setupMockLdap() { | ||
// Save the original newLdapAccessImpl function | ||
originalLdapAccessImpl := newLdapAccessImpl | ||
|
||
// Replace with mock implementation | ||
newLdapAccessImpl = func(config *LdapResolverConfig) LdapAccess { | ||
// Use the mockLdapSearchResult function from usergroup_ldap_resolver_mock.go | ||
return &LdapAccessMock{ | ||
SearchFunc: func(conn *ldap.Conn, searchRequest *ldap.SearchRequest) (*ldap.SearchResult, error) { | ||
// Extract username from the search filter | ||
username := "" | ||
if searchRequest != nil && searchRequest.Filter != "" { | ||
// Simple extraction - this assumes the filter format is consistent | ||
parts := strings.Split(searchRequest.Filter, "=") | ||
if len(parts) > 1 { | ||
username = strings.TrimRight(parts[len(parts)-1], ")") | ||
} | ||
} | ||
return mockLdapSearchResult(username) | ||
}, | ||
} | ||
} | ||
|
||
// Mock readSecrets to return true (successful configuration) | ||
originalReadSecrets := readSecrets | ||
readSecrets = func() bool { | ||
return true | ||
} | ||
|
||
// Store the original functions to be restored in teardown | ||
originalFunctions["newLdapAccessImpl"] = originalLdapAccessImpl | ||
originalFunctions["readSecrets"] = originalReadSecrets | ||
} | ||
|
||
// Helper function to tear down the mock LDAP implementation after testing | ||
func teardownMockLdap() { | ||
// Restore the original functions | ||
if originalImpl, ok := originalFunctions["newLdapAccessImpl"]; ok { | ||
if factory, ok := originalImpl.(ldapAccessFactory); ok { | ||
newLdapAccessImpl = factory | ||
} | ||
} | ||
|
||
if originalRead, ok := originalFunctions["readSecrets"]; ok { | ||
if readFunc, ok := originalRead.(func() bool); ok { | ||
readSecrets = readFunc | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the proposed changes I outlined in usergroup_ldap_resolver.go
, this can be all eliminated.
var ldapConf = LdapResolverConfig{ | ||
Host: common.DefaultLdapHost, | ||
Port: common.DefaultLdapPort, | ||
BaseDN: common.DefaultLdapBaseDN, | ||
Filter: common.DefaultLdapFilter, | ||
GroupAttr: common.DefaultLdapGroupAttr, | ||
ReturnAttr: common.DefaultLdapReturnAttr, | ||
BindUser: common.DefaultLdapBindUser, | ||
BindPassword: common.DefaultLdapBindPassword, | ||
Insecure: common.DefaultLdapInsecure, | ||
SSL: common.DefaultLdapSSL, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should consider moving this into the suggested new type LdapLookup
. This is yet another global variable we should try to get rid of.
What is this PR for?
This PR adds LDAP resolver functionality for user group resolution in YuniKorn. It enables the scheduler to resolve user groups using LDAP, which is essential for enterprise environments where user and group information is stored in LDAP directories.
What type of PR is it?
Todos
What is the Jira issue?
https://issues.apache.org/jira/browse/YUNIKORN-656
How should this be tested?
Screenshots (if appropriate)
Questions: