Skip to content

Add support for automatic client authentication via TLS certificate fields #1920

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: unstable
Choose a base branch
from

Conversation

omanges
Copy link

@omanges omanges commented Apr 6, 2025

This PR implements support for automatic client authentication based on a configurable field in the client's TLS certificate.

🔒 Feature Overview:

  • Introduces a new configuration directive: tls-auth-clients-field.

    • Allows specifying a certificate field (e.g., CN, O) to map to a Valkey user.

    • If a matching user is found, the client is automatically authenticated as that user upon TLS handshake.

    • Falls back to the default user if no match is found.

⚙️ Implementation Details:

  • Added getCertFieldByName() utility to extract fields from peer certificates.

  • Added autoAuthenticateClientFromCert() to handle automatic login logic post-handshake.

  • Integrated automatic authentication into the TLSAccept function after handshake completion.

  • Updated test suite (tests/integration/tls.tcl) to validate the feature.

✅ Benefits:

  • Enables smoother integration with mTLS-based infrastructures.

  • Reduces the need for clients to manually send AUTH commands.

  • Enhances security by tightly coupling certificate identity with Valkey access control.

🔬 Example:

tls-auth-clients-field CN

This will authenticate the client as the user matching the certificate's Common Name.

#1866

@omanges omanges marked this pull request as draft April 6, 2025 13:02
…g based TLS certifiacte field

Signed-off-by: Omkar Mestry <[email protected]>
@omanges omanges marked this pull request as ready for review April 6, 2025 13:04
@zuiderkwast zuiderkwast self-requested a review April 7, 2025 19:10
Copy link

codecov bot commented Apr 7, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.05%. Comparing base (add716b) to head (097de91).
Report is 3 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1920      +/-   ##
============================================
+ Coverage     71.02%   71.05%   +0.02%     
============================================
  Files           123      123              
  Lines         65683    65683              
============================================
+ Hits          46653    46668      +15     
+ Misses        19030    19015      -15     
Files with missing lines Coverage Δ
src/config.c 78.58% <ø> (+0.19%) ⬆️
src/connection.h 87.77% <ø> (ø)
src/server.h 100.00% <ø> (ø)
src/tls.c 100.00% <ø> (ø)

... and 15 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I didn't do a complete review. For example I didn't look at the tests at all yet.

I'm not sure if the config should allow any field. Maybe only CN makes sense ever and then the config should maybe not be selecting a field. But let's discuss it and think about what we can do.

I found the corresponding features in ETCD and Kubernetes. The both use only common name as username:

@@ -3361,6 +3361,7 @@ standardConfig static_configs[] = {
createBoolConfig("tls-prefer-server-ciphers", NULL, MODIFIABLE_CONFIG, server.tls_ctx_config.prefer_server_ciphers, 0, NULL, applyTlsCfg),
createBoolConfig("tls-session-caching", NULL, MODIFIABLE_CONFIG, server.tls_ctx_config.session_caching, 1, NULL, applyTlsCfg),
createStringConfig("tls-cert-file", NULL, VOLATILE_CONFIG | MODIFIABLE_CONFIG, EMPTY_STRING_IS_NULL, server.tls_ctx_config.cert_file, NULL, NULL, applyTlsCfg),
createStringConfig("tls-auth-clients-field", NULL, VOLATILE_CONFIG | MODIFIABLE_CONFIG, EMPTY_STRING_IS_NULL, server.tls_ctx_config.client_auth_field, NULL, NULL, applyTlsCfg),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config should probably be an enum config, so we can validate that only values that we can handle are allowed.

Comment on lines +686 to +690
if (!strcasecmp(field, "CN"))
nid = NID_commonName;
else if (!strcasecmp(field, "O"))
nid = NID_organizationName;
// Add more mappings here as needed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using "O" makes no sense to me, just like "C" (Country) makes no sense. Fields like that are not unique. If we start supporting it, I suspect we might regret it in the future. @madolson you mentioned "O" in a comment. Was it a serious suggestion or just an example?

Out of the top-level fields, only CN (common name) makes sense to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Someone asked for O once inside AWS, but I don't know if they would have been okay with just CN as well. We can always just start with "CN" and see if anyone asks for more in the future? The enum just allows us to add more in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The enum just allows us to add more in the future.

Yes, I agree enums in general are good for extensibility, but I doubt what we want to add in the future are top-level fields. It's more likely that we want to add

  • subject alt name: DNS name
  • an option to fail the handshake if the client cert doesn't match an existing user

But if you were asked once to use O for username, then I'm OK with it. It's an actual ask from a user.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed up with the folks, they did indeed ask for "O". They sign their certificates by team, and they want the auth to happen at the team level (imagine a bunch of apps with multiple services, they want all services to have the same permission). They said "CN" would be fine, they would just need to provision more users, but that would be "OK" for them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, I think we shouldn't name the config tls-auth-clients-field. I prefer we just do like ETCD and use a boolean config. They have a config called client-cert-auth (true or false) for exactly the same feature: https://etcd.io/docs/v3.4/op-guide/authentication/#using-tls-common-name

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about boolean config tls-auth-clients-common-name?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to focus on the fact we are authenticating the user of the client than on what field is being used. tls-auth-clients-user

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tls-auth-clients-user SGTM


X509 *cert = SSL_get_peer_certificate(conn->ssl);
if (!cert) {
serverLog(LL_WARNING, "TLS: No peer certificate found after handshake");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING is a serious log level. It means the server is in trouble and needs an administrator's help.

Missing cert is not an error. Client certificate can be configured to be optional. I think we shouldn't log anything here, not even a notice.


char field_value[256];
if (!getCertFieldByName(cert, field, field_value, sizeof(field_value))) {
serverLog(LL_WARNING, "TLS: Failed to extract field %s from certificate", field);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The should probably just be a NOTICE.

Comment on lines +729 to +733
serverLog(LL_NOTICE, "TLS: ACL lookup user: %s", u ? u->name : "NOT FOUND");
if (u) {
c->user = u;
c->flag.authenticated = true;
serverLog(LL_NOTICE, "TLS: Auto-authenticated client as %s", u->name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging twice here is too verbose IMO. Let's log only the second one, when it has been found and authenticated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If many clients connect successfully, are we're flooding the logs too much if we log here?

Is it better to log only in the NOT FOUND case? It's the exception case that shouldn't happen often. @madolson WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also mark u->name as sensitive in other places, we should add it behind the hide user data flag. I would drop the log level to verbose if we want to keep the second one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. If we redact this log message, there isn't much to log at all. Shall we just skip logging here?

If we want it, it would be something like this:

        serverLog(LL_VERBOSE,
                  "TLS: Auto-authenticated client as %s",
                  server.hide_user_data_from_log ? "*redacted*" : u->name);

@@ -138,6 +138,7 @@ struct connection {
ConnectionCallbackFunc conn_handler;
ConnectionCallbackFunc write_handler;
ConnectionCallbackFunc read_handler;
sds cert_user;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be on the underlying tlsConnection, not the higher level abstraction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm not sure this needs to be on the struct, looks like it could just be stack allocated inside the function itself.

static void TLSAccept(void *_conn) {
tls_connection *conn = (tls_connection *)_conn;
ERR_clear_error();
int ret = SSL_accept(conn->ssl);
if (ret > 0) {
conn->flags |= TLS_CONN_FLAG_ACCEPT_SUCCESS;
autoAuthenticateClientFromCert(conn);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't do this here, it's executed from the IO threads and the ACL code is not thread safe, we probably need to grab the user and stuff it on the connection to evaluate later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants