[YUNIKORN-656] Add LDAP resolver for group resolution #1021

mitdesai · 2025-05-21T19:20:41Z

What is this PR for?

This PR adds LDAP resolver functionality for user group resolution in YuniKorn. It enables the scheduler to resolve user groups using LDAP, which is essential for enterprise environments where user and group information is stored in LDAP directories.

What type of PR is it?

Todos

- Task

What is the Jira issue?

https://issues.apache.org/jira/browse/YUNIKORN-656

How should this be tested?

Screenshots (if appropriate)

Questions:

- The licenses files need update.
- There is breaking changes for older versions.
- It needs documentation.

codecov · 2025-05-28T05:46:57Z

Codecov Report

Attention: Patch coverage is 94.05520% with 28 lines in your changes missing coverage. Please review.

Project coverage is 83.10%. Comparing base (bf04bd4) to head (9bc614e).

Files with missing lines	Patch %	Lines
pkg/common/security/usergroup_ldap_resolver.go	89.13%	21 Missing and 4 partials ⚠️
pkg/common/security/ldap_validator.go	98.68%	2 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1021      +/-   ##
==========================================
+ Coverage   82.69%   83.10%   +0.41%     
==========================================
  Files          98      100       +2     
  Lines       15682    16149     +467     
==========================================
+ Hits        12968    13421     +453     
- Misses       2439     2448       +9     
- Partials      275      280       +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

manirajv06

Overall looks good.

manirajv06 · 2025-05-31T11:05:43Z

pkg/common/security/usergroup_ldap_resolver.go

+				zap.Error(err))
+			continue
+		}
+		secretValue := strings.TrimSpace(string(secretValueBytes))


Is it ok to accept these values as is? Do we need to validate against defined template/format?

manirajv06 · 2025-05-31T11:07:48Z

pkg/common/security/usergroup_ldap_resolver.go

+		interval:      cleanerInterval * time.Second,
+		lookup:        LdapLookupUser,
+		lookupGroupID: LdapLookupGroupID,
+		groupIds:      LDAPLookupGroupIds,


Do we need to initialize stop channel as well? We might to add test related based on cache clean up, interval set up etc for this new resolver

manirajv06 · 2025-05-31T11:18:56Z

pkg/common/security/usergroup_test.go

+var unknownResolver = configs.UserGroupResolver{
+	Type: "unknown",
+}
+


Please see earlier comment. We need to run these tests even for this new resolver.

manirajv06 · 2025-05-31T11:26:53Z

pkg/common/security/usergroup_ldap_resolver.go

+func LDAPLookupGroupIds(osUser *user.User) ([]string, error) {
+	sr, err := LDAPConn_Bind(osUser.Username)
+	if err != nil {
+		return nil, err


Debug log would help here.

manirajv06 · 2025-05-31T11:27:01Z

pkg/common/security/usergroup_ldap_resolver.go

+	l, err := ldap.DialURL(ldapaddr,
+		ldap.DialWithTLSConfig(&tls.Config{InsecureSkipVerify: ldapConf.Insecure})) // #nosec G402
+	if err != nil {
+		return nil, err


Debug log would help here.

manirajv06 · 2025-05-31T11:27:13Z

pkg/common/security/usergroup_ldap_resolver.go

+
+	err = l.Bind(ldapConf.BindUser, ldapConf.BindPassword)
+	if err != nil {
+		return nil, err


Debug log would help here.

manirajv06 · 2025-05-31T11:30:18Z

pkg/common/security/usergroup_ldap_resolver.go

+}
+
+func GetUserGroupCacheLdap() *UserGroupCache {
+	readSecrets()


Should we go forward if there are no secrets? Could happen because of LDAP secret mount setup issues. Since we use default values, warning log would help here with values as well because some values might be available or some not.

@mitdesai see question from Mani. I'd like to know the same thing.

With regards to configuration and secrets management, I'm pretty strongly against having everything in the secret file. At most username and password should be there; everything else ought to be handled via the normal configuration framework the rest of YuniKorn uses. Otherwise the LDAP resolver becomes this weird "add-on" that doesn't integrate well with the rest of YuniKorn. This would also solve the question Mani is asking -- if the feature is disabled, we skip loading the secrets entirely. If enabled, then we load secrets. If not present or empty we assume anonymous access (yes, people still do this). I think we should also use the K8s secret API directly rather than file access as we do elsewhere in the code. The secrets can then be watched via an Informer so that credentials can be hot reloaded (as should the rest of the config like we do elsewhere in YuniKorn.

Noted! Let me check how we are handling the secrets elsewhere. Rest of the discussion about the secrets I had with @pbacsko is in a separate thread few comments below.

Failing on startup if secrets cannot be loaded is now part of the PR.

+1 for the informer + secrets API. Not sure why I haven't thought of it, but that seems to be the optimal solution.

@craigcondit @pbacsko : to read the secrets using the informers + secrets API, core will get dependent on k8s libraries. I believe, this is something we are trying to avoid. Is that correct?

The configuration is read in the shim and passed into the core. This will be no different. Updates will be needed to both.

manirajv06 · 2025-05-31T11:33:59Z

pkg/common/security/usergroup_ldap_resolver.go

+	var groups []string
+	for _, entry := range sr.Entries {
+		a := entry.GetAttributeValues("memberOf")
+		println(a)


debug statement, not required.

pbacsko

I'll have a second round later on, now I'm adding comments based on my first impression.

pkg/common/security/usergroup_ldap_resolver.go

pkg/common/security/usergroup_ldap_resolver_test.go

mitdesai · 2025-06-05T22:51:54Z

Thanks for the review @pbacsko and @manirajv06. Valid concerns. I will update the PR today

pbacsko

Second round - gave some more comments.

pbacsko · 2025-06-11T13:10:35Z

pkg/common/security/usergroup_ldap_resolver.go

+
+// read secrets from the secrets directory
+// returns true if at least one secret was loaded and the configuration is valid, false otherwise
+var readSecrets = func() bool {


Just like in case of LdapAccess, I propose an interface for this, eg. SecretReader with two impementations: SecretReaderImpl and SecretReaderMock.

pbacsko · 2025-06-11T13:11:29Z

pkg/common/security/usergroup_ldap_resolver.go

+	return secretCount > 0 && isValid && len(missingFields) == 0
+}
+
+func GetUserGroupCacheLdap() *UserGroupCache {


This function should take two arguments: one for LdapAccess and one for SecretReader. That way, we eliminate global state which can affect unit tests. See below for details.

pbacsko · 2025-06-11T13:13:41Z

pkg/common/security/usergroup_ldap_resolver.go

+
+	if !secretsLoaded {
+		// Log a FATAL level message - this is very prominent and will typically cause the application to exit
+		log.Log(log.Security).Fatal("LDAP configuration not found or invalid. No secrets were loaded from the secrets directory.",


Do we truly need this to be fatal? I'm wondering if we have to be that strict.

My initial implementation did not have it. But while addressing the review comments I received from you and Mani, I through that if LDAP is actually a requirement, and there is something wrong with the secrets, logging a warning and moving with defaults could result in situations which could be undesirable especially in prod like environments.
The resolver initialization happens in the very beginning of scheduler startup and we would never come back to read the secrets. So it is very easy to have that one warning message to get lost among all the other logs.

Yes, I was thinking about the same thing, ie. undesired side-effects. I'm OK with leaving it in the code, but we should definitely document it.

Two questions:

Is the Ldap mount always expected to exist? Are we always supposed to have this directory with files inside?

By taking a deeper look at readSecrets(), it looks like we're reading configuration entries, not really secrets. Is this proper naming in the context of LDAP?

BTW sorry for the slightly fragmented feedback, it takes some time to understand this PR, it's a slightly bigger one +3500 LOC :)

No problem. I actually appreciate you taking interest in reviewing such PRs :-)

For your questions:

Yes, because the the secret that I am expecting will be created like this:

kubectl -n yunikorn create secret generic ldap-secret \ --from-literal=Host=$host \ --from-literal=Port=$port \ --from-literal=BaseDN=$baseDN \ --from-literal=Filter=$filter \ --from-literal=GroupAttr=$groupAttr \ --from-literal=ReturnAttr=$returnAttr \ --from-literal=BindUser=$bindUser \ --from-literal=BindPassword=$bindPassword \ --from-literal=Insecure=$insecure \ --from-literal=SSL=$ssl \ --dry-run=client -o yaml | kubectl apply -f -

When ldap-secret is mounted, I get as many files as there are keys in the secret.

It is called readSecrets() because we actually attempt to load and read the kubernetes secret. You are right that most of these are configurations. except the BindPassword. In order to keep all the LDAP related settings together, I put them inside the ldap-secret.

Part of the reason is also that I had moved on from reading the ldap details from configmap to reading it from secret. At that time, it was a lift and shift effort in order to minimize code changes.

Alright, got it. We definitely need to extend the documentation with this.

pbacsko · 2025-06-11T13:14:20Z

pkg/common/security/usergroup_ldap_resolver.go

+type ldapAccessFactory func(config *LdapResolverConfig) LdapAccess
+
+// defaultLdapAccessFactory is the default factory function that creates real LdapAccessImpl instances
+var defaultLdapAccessFactory ldapAccessFactory = func(config *LdapResolverConfig) LdapAccess {


See comment below, store the LdapAccess as a variable in the struct, so such global variables are not needed.

pbacsko · 2025-06-11T13:15:51Z

pkg/common/security/usergroup_ldap_resolver.go

+// newLdapAccessImpl creates a new LdapAccess instance using the current factory
+// This can be replaced in tests to return mock implementations
+var newLdapAccessImpl = defaultLdapAccessFactory
+
+// resetLdapAccessFactory resets the factory to the default implementation
+// This is used in tests to ensure the global state is restored
+func resetLdapAccessFactory() {
+	newLdapAccessImpl = defaultLdapAccessFactory
+}


Can be removed if we store LdapAccess as struct member.

pbacsko · 2025-06-11T13:23:06Z

pkg/common/security/usergroup_ldap_resolver.go

+		lookup:        LdapLookupUser,
+		lookupGroupID: LdapLookupGroupID,
+		groupIds:      LDAPLookupGroupIds,


These functions should be part of a type called LdapLookup. So this

return &UserGroupCache{ ugs: map[string]*UserGroup{}, interval: cleanerInterval * time.Second, lookup: LdapLookupUser, lookupGroupID: LdapLookupGroupID, groupIds: LDAPLookupGroupIds, stop: make(chan struct{}), }

becomes

ldapLookup: &LdapLookup{secretReader: secretReader, ldapAccess: ldapAccess} return &UserGroupCache{ ugs: map[string]*UserGroup{}, interval: cleanerInterval * time.Second, lookup: ldapLookup.LdapLookupUser, lookupGroupID: ldapLookup.LdapLookupGroupID, groupIds: ldapLookup.LDAPLookupGroupIds, stop: make(chan struct{}), }

Which is better because we can store both secretReader and ldapAccess inside ldapLookup. That way, there's no need to always create an LdapAccessImpl every time we call LDAPLookupGroupIds(). Also makes unit testing completely stateless - there will be no need to setup/teardown stuff, defer calls, etc.

pbacsko · 2025-06-11T13:23:54Z

pkg/common/security/usergroup_ldap_resolver.go

+	return &group, nil
+}
+
+func LDAPLookupGroupIds(osUser *user.User) ([]string, error) {


nit: LDAP vs Ldap, use consistent capitalization ("Ldap" is preferred)

pbacsko · 2025-06-11T13:25:04Z

pkg/common/security/usergroup_ldap_resolver.go

+	BindUser     string
+	BindPassword string
+	Insecure     bool
+	SSL          bool


nit: "UseSSL" instead of "SSL" to make it less ambiguous

pbacsko · 2025-06-11T13:27:47Z

pkg/common/security/usergroup_test.go

+// Helper function to set up the mock LDAP implementation for testing
+func setupMockLdap() {
+	// Save the original newLdapAccessImpl function
+	originalLdapAccessImpl := newLdapAccessImpl
+
+	// Replace with mock implementation
+	newLdapAccessImpl = func(config *LdapResolverConfig) LdapAccess {
+		// Use the mockLdapSearchResult function from usergroup_ldap_resolver_mock.go
+		return &LdapAccessMock{
+			SearchFunc: func(conn *ldap.Conn, searchRequest *ldap.SearchRequest) (*ldap.SearchResult, error) {
+				// Extract username from the search filter
+				username := ""
+				if searchRequest != nil && searchRequest.Filter != "" {
+					// Simple extraction - this assumes the filter format is consistent
+					parts := strings.Split(searchRequest.Filter, "=")
+					if len(parts) > 1 {
+						username = strings.TrimRight(parts[len(parts)-1], ")")
+					}
+				}
+				return mockLdapSearchResult(username)
+			},
+		}
+	}
+
+	// Mock readSecrets to return true (successful configuration)
+	originalReadSecrets := readSecrets
+	readSecrets = func() bool {
+		return true
+	}
+
+	// Store the original functions to be restored in teardown
+	originalFunctions["newLdapAccessImpl"] = originalLdapAccessImpl
+	originalFunctions["readSecrets"] = originalReadSecrets
+}
+
+// Helper function to tear down the mock LDAP implementation after testing
+func teardownMockLdap() {
+	// Restore the original functions
+	if originalImpl, ok := originalFunctions["newLdapAccessImpl"]; ok {
+		if factory, ok := originalImpl.(ldapAccessFactory); ok {
+			newLdapAccessImpl = factory
+		}
+	}
+
+	if originalRead, ok := originalFunctions["readSecrets"]; ok {
+		if readFunc, ok := originalRead.(func() bool); ok {
+			readSecrets = readFunc
+		}
+	}
+}


With the proposed changes I outlined in usergroup_ldap_resolver.go, this can be all eliminated.

pbacsko · 2025-06-12T10:47:22Z

pkg/common/security/usergroup_ldap_resolver.go

+var ldapConf = LdapResolverConfig{
+	Host:         common.DefaultLdapHost,
+	Port:         common.DefaultLdapPort,
+	BaseDN:       common.DefaultLdapBaseDN,
+	Filter:       common.DefaultLdapFilter,
+	GroupAttr:    common.DefaultLdapGroupAttr,
+	ReturnAttr:   common.DefaultLdapReturnAttr,
+	BindUser:     common.DefaultLdapBindUser,
+	BindPassword: common.DefaultLdapBindPassword,
+	Insecure:     common.DefaultLdapInsecure,
+	SSL:          common.DefaultLdapSSL,
+}


We should consider moving this into the suggested new type LdapLookup. This is yet another global variable we should try to get rid of.

manirajv06 assigned mitdesai May 23, 2025

mitdesai force-pushed the YUNIKORN-656 branch 2 times, most recently from 68948ab to 9e4dd49 Compare May 23, 2025 21:07

manirajv06 requested changes May 31, 2025

View reviewed changes

pbacsko requested review from chia7712, wilfred-s, pbacsko and chenyulin0719 June 5, 2025 08:06

pbacsko requested changes Jun 5, 2025

View reviewed changes

mitdesai force-pushed the YUNIKORN-656 branch from 9e4dd49 to 1550ad5 Compare June 6, 2025 15:46

[YUNIKORN-656] Add LDAP resolver for group resolution

9bc614e

mitdesai force-pushed the YUNIKORN-656 branch from 1550ad5 to 9bc614e Compare June 6, 2025 15:50

pbacsko requested changes Jun 11, 2025

View reviewed changes

pbacsko reviewed Jun 12, 2025

View reviewed changes

[YUNIKORN-656] Add LDAP resolver for group resolution #1021

Are you sure you want to change the base?

[YUNIKORN-656] Add LDAP resolver for group resolution #1021

Uh oh!

Conversation

mitdesai commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is this PR for?

What type of PR is it?

Todos

What is the Jira issue?

How should this be tested?

Screenshots (if appropriate)

Questions:

Uh oh!

codecov bot commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

manirajv06 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

craigcondit Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mitdesai Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pbacsko Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pbacsko left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mitdesai commented Jun 5, 2025

Uh oh!

pbacsko left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pbacsko Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pbacsko Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

mitdesai commented May 21, 2025 •

edited

Loading

codecov bot commented May 28, 2025 •

edited

Loading

craigcondit Jun 12, 2025 •

edited

Loading

mitdesai Jun 12, 2025 •

edited

Loading

pbacsko Jun 12, 2025 •

edited

Loading

pbacsko Jun 11, 2025 •

edited

Loading

pbacsko Jun 12, 2025 •

edited

Loading