Skip to content

[YUNIKORN-656] Add LDAP resolver for group resolution #1021

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mitdesai
Copy link
Contributor

@mitdesai mitdesai commented May 21, 2025

What is this PR for?

This PR adds LDAP resolver functionality for user group resolution in YuniKorn. It enables the scheduler to resolve user groups using LDAP, which is essential for enterprise environments where user and group information is stored in LDAP directories.

What type of PR is it?

  • - Bug Fix
  • - Improvement
  • - Feature
  • - Documentation
  • - Hot Fix
  • - Refactoring

Todos

  • - Task

What is the Jira issue?

https://issues.apache.org/jira/browse/YUNIKORN-656

How should this be tested?

Screenshots (if appropriate)

Questions:

  • - The licenses files need update.
  • - There is breaking changes for older versions.
  • - It needs documentation.

@mitdesai mitdesai force-pushed the YUNIKORN-656 branch 2 times, most recently from 68948ab to 9e4dd49 Compare May 23, 2025 21:07
Copy link

codecov bot commented May 28, 2025

Codecov Report

Attention: Patch coverage is 94.05520% with 28 lines in your changes missing coverage. Please review.

Project coverage is 83.10%. Comparing base (bf04bd4) to head (9bc614e).

Files with missing lines Patch % Lines
pkg/common/security/usergroup_ldap_resolver.go 89.13% 21 Missing and 4 partials ⚠️
pkg/common/security/ldap_validator.go 98.68% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1021      +/-   ##
==========================================
+ Coverage   82.69%   83.10%   +0.41%     
==========================================
  Files          98      100       +2     
  Lines       15682    16149     +467     
==========================================
+ Hits        12968    13421     +453     
- Misses       2439     2448       +9     
- Partials      275      280       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@manirajv06 manirajv06 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good.

zap.Error(err))
continue
}
secretValue := strings.TrimSpace(string(secretValueBytes))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ok to accept these values as is? Do we need to validate against defined template/format?

interval: cleanerInterval * time.Second,
lookup: LdapLookupUser,
lookupGroupID: LdapLookupGroupID,
groupIds: LDAPLookupGroupIds,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to initialize stop channel as well? We might to add test related based on cache clean up, interval set up etc for this new resolver

var unknownResolver = configs.UserGroupResolver{
Type: "unknown",
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see earlier comment. We need to run these tests even for this new resolver.

func LDAPLookupGroupIds(osUser *user.User) ([]string, error) {
sr, err := LDAPConn_Bind(osUser.Username)
if err != nil {
return nil, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug log would help here.

l, err := ldap.DialURL(ldapaddr,
ldap.DialWithTLSConfig(&tls.Config{InsecureSkipVerify: ldapConf.Insecure})) // #nosec G402
if err != nil {
return nil, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug log would help here.


err = l.Bind(ldapConf.BindUser, ldapConf.BindPassword)
if err != nil {
return nil, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug log would help here.

}

func GetUserGroupCacheLdap() *UserGroupCache {
readSecrets()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we go forward if there are no secrets? Could happen because of LDAP secret mount setup issues. Since we use default values, warning log would help here with values as well because some values might be available or some not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mitdesai see question from Mani. I'd like to know the same thing.

Copy link
Contributor

@craigcondit craigcondit Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With regards to configuration and secrets management, I'm pretty strongly against having everything in the secret file. At most username and password should be there; everything else ought to be handled via the normal configuration framework the rest of YuniKorn uses. Otherwise the LDAP resolver becomes this weird "add-on" that doesn't integrate well with the rest of YuniKorn. This would also solve the question Mani is asking -- if the feature is disabled, we skip loading the secrets entirely. If enabled, then we load secrets. If not present or empty we assume anonymous access (yes, people still do this). I think we should also use the K8s secret API directly rather than file access as we do elsewhere in the code. The secrets can then be watched via an Informer so that credentials can be hot reloaded (as should the rest of the config like we do elsewhere in YuniKorn.

Copy link
Contributor Author

@mitdesai mitdesai Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted! Let me check how we are handling the secrets elsewhere. Rest of the discussion about the secrets I had with @pbacsko is in a separate thread few comments below.

Failing on startup if secrets cannot be loaded is now part of the PR.

Copy link
Contributor

@pbacsko pbacsko Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for the informer + secrets API. Not sure why I haven't thought of it, but that seems to be the optimal solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@craigcondit @pbacsko : to read the secrets using the informers + secrets API, core will get dependent on k8s libraries. I believe, this is something we are trying to avoid. Is that correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The configuration is read in the shim and passed into the core. This will be no different. Updates will be needed to both.

var groups []string
for _, entry := range sr.Entries {
a := entry.GetAttributeValues("memberOf")
println(a)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debug statement, not required.

Copy link
Contributor

@pbacsko pbacsko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have a second round later on, now I'm adding comments based on my first impression.

@mitdesai
Copy link
Contributor Author

mitdesai commented Jun 5, 2025

Thanks for the review @pbacsko and @manirajv06. Valid concerns. I will update the PR today

Copy link
Contributor

@pbacsko pbacsko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second round - gave some more comments.


// read secrets from the secrets directory
// returns true if at least one secret was loaded and the configuration is valid, false otherwise
var readSecrets = func() bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just like in case of LdapAccess, I propose an interface for this, eg. SecretReader with two impementations: SecretReaderImpl and SecretReaderMock.

return secretCount > 0 && isValid && len(missingFields) == 0
}

func GetUserGroupCacheLdap() *UserGroupCache {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should take two arguments: one for LdapAccess and one for SecretReader. That way, we eliminate global state which can affect unit tests. See below for details.


if !secretsLoaded {
// Log a FATAL level message - this is very prominent and will typically cause the application to exit
log.Log(log.Security).Fatal("LDAP configuration not found or invalid. No secrets were loaded from the secrets directory.",
Copy link
Contributor

@pbacsko pbacsko Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we truly need this to be fatal? I'm wondering if we have to be that strict.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial implementation did not have it. But while addressing the review comments I received from you and Mani, I through that if LDAP is actually a requirement, and there is something wrong with the secrets, logging a warning and moving with defaults could result in situations which could be undesirable especially in prod like environments.
The resolver initialization happens in the very beginning of scheduler startup and we would never come back to read the secrets. So it is very easy to have that one warning message to get lost among all the other logs.

Copy link
Contributor

@pbacsko pbacsko Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was thinking about the same thing, ie. undesired side-effects. I'm OK with leaving it in the code, but we should definitely document it.

Two questions:

  1. Is the Ldap mount always expected to exist? Are we always supposed to have this directory with files inside?
  2. By taking a deeper look at readSecrets(), it looks like we're reading configuration entries, not really secrets. Is this proper naming in the context of LDAP?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW sorry for the slightly fragmented feedback, it takes some time to understand this PR, it's a slightly bigger one +3500 LOC :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem. I actually appreciate you taking interest in reviewing such PRs :-)

For your questions:

  1. Yes, because the the secret that I am expecting will be created like this:
kubectl -n yunikorn create secret generic ldap-secret \
    --from-literal=Host=$host \
    --from-literal=Port=$port \
    --from-literal=BaseDN=$baseDN \
    --from-literal=Filter=$filter \
    --from-literal=GroupAttr=$groupAttr \
    --from-literal=ReturnAttr=$returnAttr \
    --from-literal=BindUser=$bindUser \
    --from-literal=BindPassword=$bindPassword \
    --from-literal=Insecure=$insecure \
    --from-literal=SSL=$ssl \
    --dry-run=client -o yaml | kubectl apply -f -

When ldap-secret is mounted, I get as many files as there are keys in the secret.

  1. It is called readSecrets() because we actually attempt to load and read the kubernetes secret. You are right that most of these are configurations. except the BindPassword. In order to keep all the LDAP related settings together, I put them inside the ldap-secret.

Part of the reason is also that I had moved on from reading the ldap details from configmap to reading it from secret. At that time, it was a lift and shift effort in order to minimize code changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, got it. We definitely need to extend the documentation with this.

type ldapAccessFactory func(config *LdapResolverConfig) LdapAccess

// defaultLdapAccessFactory is the default factory function that creates real LdapAccessImpl instances
var defaultLdapAccessFactory ldapAccessFactory = func(config *LdapResolverConfig) LdapAccess {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment below, store the LdapAccess as a variable in the struct, so such global variables are not needed.

Comment on lines +82 to +90
// newLdapAccessImpl creates a new LdapAccess instance using the current factory
// This can be replaced in tests to return mock implementations
var newLdapAccessImpl = defaultLdapAccessFactory

// resetLdapAccessFactory resets the factory to the default implementation
// This is used in tests to ensure the global state is restored
func resetLdapAccessFactory() {
newLdapAccessImpl = defaultLdapAccessFactory
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be removed if we store LdapAccess as struct member.

Comment on lines +261 to +263
lookup: LdapLookupUser,
lookupGroupID: LdapLookupGroupID,
groupIds: LDAPLookupGroupIds,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions should be part of a type called LdapLookup. So this

return &UserGroupCache{
		ugs:           map[string]*UserGroup{},
		interval:      cleanerInterval * time.Second,
		lookup:        LdapLookupUser,
		lookupGroupID: LdapLookupGroupID,
		groupIds:      LDAPLookupGroupIds,
                stop:          make(chan struct{}),
	}

becomes

ldapLookup: &LdapLookup{secretReader: secretReader, ldapAccess: ldapAccess}

return &UserGroupCache{
		ugs:           map[string]*UserGroup{},
		interval:      cleanerInterval * time.Second,
		lookup:        ldapLookup.LdapLookupUser,
		lookupGroupID: ldapLookup.LdapLookupGroupID,
		groupIds:      ldapLookup.LDAPLookupGroupIds,
                stop:          make(chan struct{}),
	}

Which is better because we can store both secretReader and ldapAccess inside ldapLookup. That way, there's no need to always create an LdapAccessImpl every time we call LDAPLookupGroupIds(). Also makes unit testing completely stateless - there will be no need to setup/teardown stuff, defer calls, etc.

return &group, nil
}

func LDAPLookupGroupIds(osUser *user.User) ([]string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: LDAP vs Ldap, use consistent capitalization ("Ldap" is preferred)

BindUser string
BindPassword string
Insecure bool
SSL bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "UseSSL" instead of "SSL" to make it less ambiguous

Comment on lines +37 to +86
// Helper function to set up the mock LDAP implementation for testing
func setupMockLdap() {
// Save the original newLdapAccessImpl function
originalLdapAccessImpl := newLdapAccessImpl

// Replace with mock implementation
newLdapAccessImpl = func(config *LdapResolverConfig) LdapAccess {
// Use the mockLdapSearchResult function from usergroup_ldap_resolver_mock.go
return &LdapAccessMock{
SearchFunc: func(conn *ldap.Conn, searchRequest *ldap.SearchRequest) (*ldap.SearchResult, error) {
// Extract username from the search filter
username := ""
if searchRequest != nil && searchRequest.Filter != "" {
// Simple extraction - this assumes the filter format is consistent
parts := strings.Split(searchRequest.Filter, "=")
if len(parts) > 1 {
username = strings.TrimRight(parts[len(parts)-1], ")")
}
}
return mockLdapSearchResult(username)
},
}
}

// Mock readSecrets to return true (successful configuration)
originalReadSecrets := readSecrets
readSecrets = func() bool {
return true
}

// Store the original functions to be restored in teardown
originalFunctions["newLdapAccessImpl"] = originalLdapAccessImpl
originalFunctions["readSecrets"] = originalReadSecrets
}

// Helper function to tear down the mock LDAP implementation after testing
func teardownMockLdap() {
// Restore the original functions
if originalImpl, ok := originalFunctions["newLdapAccessImpl"]; ok {
if factory, ok := originalImpl.(ldapAccessFactory); ok {
newLdapAccessImpl = factory
}
}

if originalRead, ok := originalFunctions["readSecrets"]; ok {
if readFunc, ok := originalRead.(func() bool); ok {
readSecrets = readFunc
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the proposed changes I outlined in usergroup_ldap_resolver.go, this can be all eliminated.

Comment on lines +107 to +118
var ldapConf = LdapResolverConfig{
Host: common.DefaultLdapHost,
Port: common.DefaultLdapPort,
BaseDN: common.DefaultLdapBaseDN,
Filter: common.DefaultLdapFilter,
GroupAttr: common.DefaultLdapGroupAttr,
ReturnAttr: common.DefaultLdapReturnAttr,
BindUser: common.DefaultLdapBindUser,
BindPassword: common.DefaultLdapBindPassword,
Insecure: common.DefaultLdapInsecure,
SSL: common.DefaultLdapSSL,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider moving this into the suggested new type LdapLookup. This is yet another global variable we should try to get rid of.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants